Commit Graph

67 Commits

Author SHA1 Message Date
Cesar Philippidis
2049befdd0 [nvptx] Remove use of CUDA unified memory in libgomp
libgomp/
	* plugin/plugin-nvptx.c (struct cuda_map): New.
	(struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev,
	h_tail with (cuda_map *) map.
	(cuda_map_create): New function.
	(cuda_map_destroy): New function.
	(map_init): Update to use a linked list of cuda_map objects.
	(map_fini): Likewise.
	(map_pop): Likewise.
	(map_push): Likewise.  Return CUdeviceptr instead of void.
	(init_streams_for_device): Remove stales references to ptx_stream
	members.
	(select_stream_for_async): Likewise.
	(nvptx_exec): Update call to map_init.

From-SVN: r264397
2018-09-18 08:41:54 -07:00
Cesar Philippidis
bd9b3d3d1a [nvptx] Use CUDA driver API to select default runtime launch geometry
The CUDA driver API starting version 6.5 offers a set of runtime functions to
calculate several occupancy-related measures, as a replacement for the occupancy
calculator spreadsheet.

This patch adds a heuristic for default runtime launch geometry, based on the
new runtime function cuOccupancyMaxPotentialBlockSize.

Build on x86_64 with nvptx accelerator and ran libgomp testsuite.

2018-08-13  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	PR target/85590
	* plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
	(cuOccupancyMaxPotentialBlockSize): Declare.
	* plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define
	CUoccupancyB2DSize and declare
	cuOccupancyMaxPotentialBlockSize.
	(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
	default num_gangs and num_workers when the driver supports it.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r263505
2018-08-13 12:04:24 +00:00
Tom de Vries
8e09a12f01 [libgomp, nvptx] Fall back to cuLinkAddData/cuLinkCreate if _v2 not found
Cuda driver api functions cuLinkAddData and cuLinkCreate are available starting
version 5.5.  In version 6.5, they are remapped onto _v2 versions.

The dlopen interface of the libgomp nvptx plugin uses the _v2 versions, so it
won't work with a cuda driver with driver api version lower than 6.5.

This patch fixes the problem by testing for the presence of the _v2 versions,
and falling back to the original versions in case of absence of the _v2
versions.

Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and
without --without-cuda-driver.

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/cuda-lib.def (cuLinkAddData_v2, cuLinkCreate_v2): Declare using
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-nvptx.c (cuLinkAddData, cuLinkCreate): Undef and declare.
	(cuLinkAddData_v2, cuLinkCreate_v2): Declare.
	(link_ptx): Fall back to cuLinkAddData/cuLinkCreate if the _v2 versions
	are not found.

From-SVN: r263408
2018-08-08 14:26:37 +00:00
Tom de Vries
cedd9bd016 [libgomp, nvptx] Allow cuGetErrorString to be NULL
Cuda driver api function cuGetErrorString is available in version 6.0 and
higher.

Currently, when the driver that is used does not contain this function, the
libgomp nvptx plugin will not build (PLUGIN_NVPTX_DYNAMIC == 0) or run
(PLUGIN_NVPTX_DYNAMIC == 1).

This patch fixes this problem by testing for the presence of the function, and
handling absence.

Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and
without --without-cuda-driver.

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/cuda-lib.def (cuGetErrorString): Use CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-nvptx.c (cuda_error): Handle if cuGetErrorString is not
	present.

From-SVN: r263407
2018-08-08 14:26:28 +00:00
Tom de Vries
b113af959c [libgomp, nvptx] Remove hard-coded const in nvptx_open_device
CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR is defined in cuda driver
api version 6.0 and higher.

Currently nvptx_open_device uses a hard-coded constant instead.

This patch fixes that by:
- defining CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR to the hardcoded
  constant at toplevel, if not present in cuda.h, and
- using CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR in nvptx_open_device

Build on x86_64 with nvptx accelerator and reg-tested libgomp.

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c
	(CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR): Define.
	(nvptx_open_device): Use
	CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR.

From-SVN: r263406
2018-08-08 14:26:19 +00:00
Tom de Vries
94767dacea [libgomp, nvptx] Note that cuGetErrorString is in CUDA_VERSION >= 6000
Cuda driver api function cuGetErrorString is available in version 6.0 and
higher.

This patch:
- removes a comment saying the declaration is not available in cuda.h 6.0
- fixes the presence test to use CUDA_VERSION < 6000
- moves the declaration to toplevel

Build on x86_64 with nvptx accelerator and reg-tested libgomp.

2018-08-08  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (cuda_error): Move declaration of cuGetErrorString ...
	(cuGetErrorString): ... here.  Guard with CUDA_VERSION < 6000.

From-SVN: r263405
2018-08-08 14:26:10 +00:00
Tom de Vries
02150de863 [libgomp, nvptx] Handle CUDA_ONE_CALL_MAYBE_NULL
This patch adds handling of functions that may not be present in the cuda
driver.

Such a function can be declared using CUDA_ONE_CALL_MAYBE_NULL in cuda-lib.def,
it can be called with the usual convenience macros, but before calling its
presence needs to be tested using new macro CUDA_CALL_EXISTS.

When using the dlopen interface (PLUGIN_NVPTX_DYNAMIC == 1), we allow
non-present functions by allowing dlsym to return NULL.  Otherwise
(PLUGIN_NVPTX_DYNAMIC == 0) we declare the non-present function to be weak.

Build and reg-tested libgomp on x86_64 with nvidia accelerator, with and without
--disable-cuda-driver, in combination with a trigger patch that adds a
non-existing function foo to cuda-lib.def:
...
CUDA_ONE_CALL_MAYBE_NULL (foo)
...
and declares it in plugin-nvptx.c:
...
CUresult foo (void);
...
and then uses it in nvptx_init after the init_cuda_lib call:
...
  if (CUDA_CALL_EXISTS (foo))
    CUDA_CALL (foo);
...

Also build and reg-tested on x86_64 with nvidia accelerator, with and without
--disable-cuda-driver, in combination with a trigger patch that replaces all
CUDA_ONE_CALLs in cuda-lib.def with CUDA_ONE_CALL_MAYBE_NULL, and guards two
CUDA_CALLs with CUDA_CALL_EXISTS, one for a regular fn, and one for a fn that is
a define in cuda/cuda.h.

2018-08-07  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (DO_PRAGMA): Define.
	(struct cuda_lib_s): Add def/undef of CUDA_ONE_CALL_MAYBE_NULL.
	(init_cuda_lib): Add new param to CUDA_ONE_CALL_1.  Add arg to
	corresponding call in CUDA_ONE_CALL.  Add def/undef of
	CUDA_ONE_CALL_MAYBE_NULL.
	(CUDA_CALL_EXISTS): Define.

From-SVN: r263346
2018-08-06 22:13:56 +00:00
Tom de Vries
9e28b10779 [libgomp, nvptx] Minimize lifetime of CUDA_ONE_CALL defines
This patch makes sure that the lifetimes of the CUDA_ONE_CALL macro (which is
defined twice in plugin-nvptx.c) are minimized, to make it obvious that the
definitions are used only in the lib-cuda.def include.

Build on x86_64 with nvptx accelerator and reg-tested libgomp.

2018-08-07  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (struct cuda_lib_s, init_cuda_lib): Put
	CUDA_ONE_CALL defines right before the cuda-lib.def include, and the
	corresponding undefs right after.

From-SVN: r263345
2018-08-06 22:13:46 +00:00
Tom de Vries
099400909e [libgomp, nvptx, --without-cuda-driver] Don't use system cuda driver
Using libgomp configure option --with-cuda-driver=<dir> we can indicate what
cuda driver to use to build the libgomp nvptx plugin.  Without such an option,
the system cuda driver is used, if available.  If not availabe, a dlopen
interface is used instead.

However, when we use --without-cuda-driver (or the equivalent
--with-cuda-driver=no) the system cuda driver is still used if available.

This patch fixes that, making sure that --without-cuda-driver selects the dlopen
interface.

Build on x86_64 with nvptx accelerator and tested libgomp testsuite, with and
without option --without-cuda-driver.

2018-08-04  Tom de Vries  <tdevries@suse.de>

	* plugin/configfrag.ac: For --without-cuda-driver, set
	CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB to no.  Handle
	CUDA_DRIVER_INCLUDE == no and CUDA_DRIVER_LIB == no.
	* configure: Regenerate.

From-SVN: r263310
2018-08-04 20:07:22 +00:00
Cesar Philippidis
094db6beb9 [PATCH] Remove use of 'struct map' from plugin (nvptx)
libgomp/
	* plugin/plugin-nvptx.c (struct map): Removed.
	(map_init, map_pop): Remove use of struct map. (map_push):
	Likewise and change argument list.
	* testsuite/libgomp.oacc-c-c++-common/mapping-1.c: New

Co-Authored-By: James Norris <jnorris@codesourcery.com>

From-SVN: r263212
2018-08-01 07:09:56 -07:00
Tom de Vries
8c6310a2c2 [libgomp, nvptx] Add cuda-lib.def
2018-08-01  Tom de Vries  <tdevries@suse.de>

	* plugin/cuda-lib.def: New file.  Factor out of ...
	* plugin/plugin-nvptx.c (CUDA_CALLS): ... here.
	(struct cuda_lib_s, init_cuda_lib): Include cuda-lib.def instead of
	using CUDA_CALLS.

From-SVN: r263208
2018-08-01 13:20:22 +00:00
Tom de Vries
4cdfee3f20 [libgomp, nvptx] Handle per-function max-threads-per-block in default dims
Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the
kernel launch configurations exceeds the resources available in the device, due
to the default dimensions chosen by the runtime.

This patch fixes that by taking the per-function max_threads_per_block into
account when using the default dimensions.

2018-07-30  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (MIN, MAX): Redefine.
	(nvptx_exec): Ensure worker and vector default dims don't exceed
	targ_fn->max_threads_per_block.

From-SVN: r263062
2018-07-30 08:17:26 +00:00
Tom de Vries
0b210c43bb [libgomp, nvptx] Calculate default dims per device
The default dimensions are calculated using per-device properties, but
initialized once and used on all devices.

This patch fixes this problem by introducing per-device default dimensions.

2018-07-30  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (struct ptx_device): Add default_dims field.
	(nvptx_open_device): Init default_dims for device.
	(nvptx_exec): Use default_dims from device.

From-SVN: r263061
2018-07-30 08:17:16 +00:00
Cesar Philippidis
88a4654d03 [libgomp, nvptx] Add error with recompilation hint for launch failure
Currently, when a kernel is lauched with too many workers, it results in a cuda
launch failure.  This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro
M1200.

This patch detects this situation, and errors out with a hint on how to fix it.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-07-26  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
	sufficient resources to launch a kernel, and give a hint on how to fix
	it.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r262997
2018-07-26 11:42:29 +00:00
Cesar Philippidis
0c6c2f5fc2 [libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open
Move sampling of device properties from nvptx_exec to nvptx_open, and assume
the sampling always succeeds.  This simplifies the default dimension
initialization code in nvptx_open.

2018-07-26  Cesar Philippidis  <cesar@codesourcery.com>
	    Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (struct ptx_device): Add warp_size,
	max_threads_per_block and max_threads_per_multiprocessor fields.
	(nvptx_open_device): Initialize new fields.
	(nvptx_exec): Use num_sms, and new fields.

Co-Authored-By: Tom de Vries <tdevries@suse.de>

From-SVN: r262996
2018-07-26 11:42:19 +00:00
Tom de Vries
ec00d3faf4 [openacc] Move GOMP_OPENACC_DIM parsing out of nvptx plugin
2018-05-02  Tom de Vries  <tom@codesourcery.com>

	PR libgomp/85411
	* plugin/plugin-nvptx.c (nvptx_exec): Move parsing of
	GOMP_OPENACC_DIM ...
	* env.c (parse_gomp_openacc_dim): ... here.  New function.
	(initialize_env): Call parse_gomp_openacc_dim.
	(goacc_default_dims): Define.
	* libgomp.h (goacc_default_dims): Declare.
	* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): New function.
	* oacc-plugin.h (GOMP_PLUGIN_acc_default_dim): Declare.
	* libgomp.map: New version "GOMP_PLUGIN_1.2". Add
	GOMP_PLUGIN_acc_default_dim.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-runtime.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.

From-SVN: r259852
2018-05-02 17:53:56 +00:00
Tom de Vries
df36a3d3be [nvptx, libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin
2018-04-26  Tom de Vries  <tom@codesourcery.com>

	PR libgomp/84020
	* plugin/cuda/cuda.h (CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL.
	* plugin/plugin-nvptx.c (_GNU_SOURCE): Define.
	(process_GOMP_NVPTX_JIT): New function.
	(link_ptx): Use process_GOMP_NVPTX_JIT.

From-SVN: r259678
2018-04-26 13:27:04 +00:00
Jakub Jelinek
85ec4feb11 Update copyright years.
From-SVN: r256169
2018-01-03 11:03:58 +01:00
Tom de Vries
12e9c8ce6c Remove semicolon after do {} while (false) in HSA_LOG
2017-10-31  Tom de Vries  <tom@codesourcery.com>

	* plugin/plugin-hsa.c (HSA_LOG): Remove semicolon after
	"do {} while (false)".
	(init_single_kernel, GOMP_OFFLOAD_async_run): Add missing semicolon
	after HSA_DEBUG call.

From-SVN: r254264
2017-10-31 12:56:41 +00:00
Tom de Vries
9607b014b2 Fix secure_getenv.h include in plugin-hsa.c
2017-07-03  Tom de Vries  <tom@codesourcery.com>

	* plugin/plugin-hsa.c: Fix secure_getenv.h include.

From-SVN: r249918
2017-07-03 13:40:19 +00:00
Tom de Vries
dfb15f6bbb Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin
2017-06-27  Tom de Vries  <tom@codesourcery.com>

	* plugin/plugin-nvptx.c (notify_var): New function.
	(nvptx_exec): Use notify_var for GOMP_OPENACC_DIM.

From-SVN: r249695
2017-06-27 15:51:48 +00:00
Tom de Vries
22f1a03704 Use secure_getenv for GOMP_DEBUG
2017-06-27  Tom de Vries  <tom@codesourcery.com>

	* env.c (parse_unsigned_long_1): Factor out of ...
	(parse_unsigned_long): ... here.
	(parse_int_1): Factor out of ...
	(parse_int): ... here.
	(parse_int_secure): New function.
	(initialize_env): Use parse_int_secure for GOMP_DEBUG.
	* secure_getenv.h: Factor out of ...
	* plugin/plugin-hsa.c: ... here.
	* testsuite/libgomp.oacc-c-c++-common/gomp-debug-env.c: New test.

From-SVN: r249694
2017-06-27 15:51:37 +00:00
Thomas Schwinge
78672bd8fd libgomp nvptx plugin: Debugging output when disabling nvptx offloading
libgomp/
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Debugging output
	when disabling nvptx offloading.

From-SVN: r248400
2017-05-24 08:59:05 +02:00
Thomas Schwinge
0da2f96af0 libgomp hsa plugin: debug output for HSA runtime library loading failure
libgomp/
	* plugin/plugin-hsa.c (DLSYM_FN, init_hsa_runtime_functions):
	Debug output for failure.

From-SVN: r248277
2017-05-19 15:32:04 +02:00
Jakub Jelinek
19929ba9c9 plugin-nvptx.c (cuda_lib_inited): Use signed char type instead of char.
* plugin/plugin-nvptx.c (cuda_lib_inited): Use signed char type
	instead of char.

From-SVN: r246918
2017-04-13 21:59:04 +02:00
Thomas Schwinge
e70ab10d5c libgomp, nvptx plugin: Make "nvptx_exec" static
libgomp/
	* plugin/plugin-nvptx.c (nvptx_exec): Make it static.

From-SVN: r245127
2017-02-02 15:35:30 +01:00
Thomas Schwinge
345a8c1712 libgomp: Normalize the names of a few functions of the libgomp plugin API
libgomp/
	* libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to
	GOMP_OFFLOAD_openacc_exec.  Adjust all users.
	(GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to
	GOMP_OFFLOAD_openacc_cuda_get_current_device.  Adjust all users.
	(GOMP_OFFLOAD_openacc_get_current_cuda_context): Rename to
	GOMP_OFFLOAD_openacc_cuda_get_current_context.  Adjust all users.
	(GOMP_OFFLOAD_openacc_get_cuda_stream): Rename to
	GOMP_OFFLOAD_openacc_cuda_get_stream.  Adjust all users.
	(GOMP_OFFLOAD_openacc_set_cuda_stream): Rename to
	GOMP_OFFLOAD_openacc_cuda_set_stream.  Adjust all users.

From-SVN: r245125
2017-02-02 15:13:57 +01:00
Thomas Schwinge
dced339c8a libgomp: Provide prototypes for functions implemented by libgomp plugins
libgomp/
	* libgomp-plugin.h: #include <stdbool.h>.
	(GOMP_OFFLOAD_get_name, GOMP_OFFLOAD_get_caps)
	(GOMP_OFFLOAD_get_type, GOMP_OFFLOAD_get_num_devices)
	(GOMP_OFFLOAD_init_device, GOMP_OFFLOAD_fini_device)
	(GOMP_OFFLOAD_version, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image, GOMP_OFFLOAD_alloc, GOMP_OFFLOAD_free)
	(GOMP_OFFLOAD_dev2host, GOMP_OFFLOAD_host2dev)
	(GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_can_run, GOMP_OFFLOAD_run)
	(GOMP_OFFLOAD_async_run, GOMP_OFFLOAD_openacc_parallel)
	(GOMP_OFFLOAD_openacc_register_async_cleanup)
	(GOMP_OFFLOAD_openacc_async_test)
	(GOMP_OFFLOAD_openacc_async_test_all)
	(GOMP_OFFLOAD_openacc_async_wait)
	(GOMP_OFFLOAD_openacc_async_wait_async)
	(GOMP_OFFLOAD_openacc_async_wait_all)
	(GOMP_OFFLOAD_openacc_async_wait_all_async)
	(GOMP_OFFLOAD_openacc_async_set_async)
	(GOMP_OFFLOAD_openacc_create_thread_data)
	(GOMP_OFFLOAD_openacc_destroy_thread_data)
	(GOMP_OFFLOAD_openacc_get_current_cuda_device)
	(GOMP_OFFLOAD_openacc_get_current_cuda_context)
	(GOMP_OFFLOAD_openacc_get_cuda_stream)
	(GOMP_OFFLOAD_openacc_set_cuda_stream): New prototypes.
	* libgomp.h (struct acc_dispatch_t, struct gomp_device_descr): Use
	these.
	* plugin/plugin-hsa.c (GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Fix argument types.
	liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_type): Fix
	return type.
	(GOMP_OFFLOAD_load_image): Fix argument types.

From-SVN: r245062
2017-01-31 15:32:58 +01:00
Pekka Jääskeläinen
5fd1486ce5 Brig front-end
2017-01-24  Pekka Jääskeläinen <pekka@parmance.com>
	    Martin Jambor  <mjambor@suse.cz>

	* Makefile.def (target_modules): Added libhsail-rt.
	(languages): Added language brig.
	* Makefile.in: Regenerated.
	* configure.ac (TOPLEVEL_CONFIGURE_ARGUMENTS): Added
	tgarget-libhsail-rt.  Make brig unsupported on untested architectures.
	* configure: Regenerated.

gcc/
	* brig-builtins.def: New file.
	* builtins.def (DEF_HSAIL_BUILTIN): New macro.
	(DEF_HSAIL_ATOMIC_BUILTIN): Likewise.
	(DEF_HSAIL_SAT_BUILTIN): Likewise.
	(DEF_HSAIL_INTR_BUILTIN): Likewise.
	(DEF_HSAIL_CVT_ZEROI_SAT_BUILTIN): Likewise.
	* builtin-types.def (BT_INT8): New.
	(BT_INT16): Likewise.
	(BT_UINT8): Likewise.
	(BT_UINT16): Likewise.
	(BT_FN_ULONG): Likewise.
	(BT_FN_UINT_INT): Likewise.
	(BT_FN_UINT_ULONG): Likewise.
	(BT_FN_UINT_LONG): Likewise.
	(BT_FN_UINT_PTR): Likewise.
	(BT_FN_ULONG_PTR): Likewise.
	(BT_FN_INT8_FLOAT): Likewise.
	(BT_FN_INT16_FLOAT): Likewise.
	(BT_FN_UINT32_FLOAT): Likewise.
	(BT_FN_UINT16_FLOAT): Likewise.
	(BT_FN_UINT8_FLOAT): Likewise.
	(BT_FN_UINT64_FLOAT): Likewise.
	(BT_FN_UINT16_UINT32): Likewise.
	(BT_FN_UINT32_UINT16): Likewise.
	(BT_FN_UINT16_UINT16_UINT16): Likewise.
	(BT_FN_INT_PTR_INT): Likewise.
	(BT_FN_UINT_PTR_UINT): Likewise.
	(BT_FN_LONG_PTR_LONG): Likewise.
	(BT_FN_ULONG_PTR_ULONG): Likewise.
	(BT_FN_VOID_UINT64_UINT64): Likewise.
	(BT_FN_UINT8_UINT8_UINT8): Likewise.
	(BT_FN_INT8_INT8_INT8): Likewise.
	(BT_FN_INT16_INT16_INT16): Likewise.
	(BT_FN_INT_INT_INT): Likewise.
	(BT_FN_UINT_FLOAT_UINT): Likewise.
	(BT_FN_FLOAT_UINT_UINT): Likewise.
	(BT_FN_ULONG_UINT_UINT): Likewise.
	(BT_FN_ULONG_UINT_PTR): Likewise.
	(BT_FN_ULONG_ULONG_ULONG): Likewise.
	(BT_FN_UINT_UINT_UINT): Likewise.
	(BT_FN_VOID_UINT_PTR): Likewise.
	(BT_FN_UINT_UINT_PTR: Likewise.
	(BT_FN_UINT32_UINT64_PTR): Likewise.
	(BT_FN_INT_INT_UINT_UINT): Likewise.
	(BT_FN_UINT_UINT_UINT_UINT): Likewise.
	(BT_FN_UINT_UINT_UINT_PTR): Likewise.
	(BT_FN_UINT_ULONG_ULONG_UINT): Likewise.
	(BT_FN_ULONG_ULONG_ULONG_ULONG): Likewise.
	(BT_FN_LONG_LONG_UINT_UINT): Likewise.
	(BT_FN_ULONG_ULONG_UINT_UINT): Likewise.
	(BT_FN_VOID_UINT32_UINT64_PTR): Likewise.
	(BT_FN_VOID_UINT32_UINT32_PTR): Likewise.
	(BT_FN_UINT_UINT_UINT_UINT_UINT): Likewise.
	(BT_FN_UINT_FLOAT_FLOAT_FLOAT_FLOAT): Likewise.
	(BT_FN_ULONG_ULONG_ULONG_UINT_UINT): Likewise.
	* doc/frontends.texi: List BRIG FE.
	* doc/install.texi (Testing): Add BRIG tesring requirements.
	* doc/invoke.texi (Overall Options): Mention BRIG.
	* doc/standards.texi (Standards): Doucment BRIG HSA version.

gcc/brig/

	* Make-lang.in: New file.
	* brig-builtins.h: Likewise.
	* brig-c.h: Likewise.
	* brig-lang.c: Likewise.
	* brigspec.c: Likewise.
	* config-lang.in: Likewise.
	* lang-specs.h: Likewise.
	* lang.opt: Likewise.
	* brigfrontend/brig-arg-block-handler.cc: Likewise.
	* brigfrontend/brig-atomic-inst-handler.cc: Likewise.
	* brigfrontend/brig-basic-inst-handler.cc: Likewise.
	* brigfrontend/brig-branch-inst-handler.cc: Likewise.
	* brigfrontend/brig-cmp-inst-handler.cc: Likewise.
	* brigfrontend/brig-code-entry-handler.cc: Likewise.
	* brigfrontend/brig-code-entry-handler.h: Likewise.
	* brigfrontend/brig-comment-handler.cc: Likewise.
	* brigfrontend/brig-control-handler.cc: Likewise.
	* brigfrontend/brig-copy-move-inst-handler.cc: Likewise.
	* brigfrontend/brig-cvt-inst-handler.cc: Likewise.
	* brigfrontend/brig-fbarrier-handler.cc: Likewise.
	* brigfrontend/brig-function-handler.cc: Likewise.
	* brigfrontend/brig-function.cc: Likewise.
	* brigfrontend/brig-function.h: Likewise.
	* brigfrontend/brig-inst-mod-handler.cc: Likewise.
	* brigfrontend/brig-label-handler.cc: Likewise.
	* brigfrontend/brig-lane-inst-handler.cc: Likewise.
	* brigfrontend/brig-machine.c: Likewise.
	* brigfrontend/brig-machine.h: Likewise.
	* brigfrontend/brig-mem-inst-handler.cc: Likewise.
	* brigfrontend/brig-module-handler.cc: Likewise.
	* brigfrontend/brig-queue-inst-handler.cc: Likewise.
	* brigfrontend/brig-seg-inst-handler.cc: Likewise.
	* brigfrontend/brig-signal-inst-handler.cc: Likewise.
	* brigfrontend/brig-to-generic.cc: Likewise.
	* brigfrontend/brig-to-generic.h: Likewise.
	* brigfrontend/brig-util.cc: Likewise.
	* brigfrontend/brig-util.h: Likewise.
	* brigfrontend/brig-variable-handler.cc: Likewise.
	* brigfrontend/phsa.h: Likewise.


gcc/testsuite/

	* lib/brig-dg.exp: New file.
	* lib/brig.exp: Likewise.
	* brig.dg/README: Likewise.
	* brig.dg/dg.exp: Likewise.
	* brig.dg/test/gimple/alloca.hsail: Likewise.
	* brig.dg/test/gimple/atomics.hsail: Likewise.
	* brig.dg/test/gimple/branches.hsail: Likewise.
	* brig.dg/test/gimple/fbarrier.hsail: Likewise.
	* brig.dg/test/gimple/function_calls.hsail: Likewise.
	* brig.dg/test/gimple/kernarg.hsail: Likewise.
	* brig.dg/test/gimple/mem.hsail: Likewise.
	* brig.dg/test/gimple/mulhi.hsail: Likewise.
	* brig.dg/test/gimple/packed.hsail: Likewise.
	* brig.dg/test/gimple/smoke_test.hsail: Likewise.
	* brig.dg/test/gimple/variables.hsail: Likewise.
	* brig.dg/test/gimple/vector.hsail: Likewise.

include/

	* hsa.h: Moved here from libgomp/plugin/hsa.h.

libgomp/

	* plugin/hsa.h: Moved to top level include.
	* plugin/plugin-hsa.c: Chanfgd include of hsa.h accordingly.

libhsail-rt/

	* Makefile.am: New file.
	* target-config.h.in: Likewise.
	* configure.ac: Likewise.
	* configure: Likewise.
	* config.h.in: Likewise.
	* aclocal.m4: Likewise.
	* README: Likewise.
	* Makefile.in: Likewise.
	* include/internal/fibers.h: Likewise.
	* include/internal/phsa-queue-interface.h: Likewise.
	* include/internal/phsa-rt.h: Likewise.
	* include/internal/workitems.h: Likewise.
	* rt/arithmetic.c: Likewise.
	* rt/atomics.c: Likewise.
	* rt/bitstring.c: Likewise.
	* rt/fbarrier.c: Likewise.
	* rt/fibers.c: Likewise.
	* rt/fp16.c: Likewise.
	* rt/misc.c: Likewise.
	* rt/multimedia.c: Likewise.
	* rt/queue.c: Likewise.
	* rt/sat_arithmetic.c: Likewise.
	* rt/segment.c: Likewise.
	* rt/workitems.c: Likewise.


Co-Authored-By: Martin Jambor <mjambor@suse.cz>

From-SVN: r244867
2017-01-24 13:45:56 +01:00
Jakub Jelinek
b32e85fa42 cuda.h (CUdeviceptr): Typedef to unsigned long long even for _WIN64.
* plugin/cuda/cuda.h (CUdeviceptr): Typedef to unsigned long long even
	for _WIN64.

From-SVN: r244638
2017-01-19 16:53:51 +01:00
Jakub Jelinek
d190d5c093 hsa.h: Add GCC runtime library exception.
* plugin/hsa.h: Add GCC runtime library exception.
	* plugin/hsa_ext_finalize.h: Likewise.

From-SVN: r244523
2017-01-17 10:45:23 +01:00
Jakub Jelinek
2393d337e7 configfrag.ac: For --without-cuda-driver don't initialize CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB.
* plugin/configfrag.ac: For --without-cuda-driver don't initialize
	CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB.  If both
	CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB are empty and linking small
	cuda program fails, define PLUGIN_NVPTX_DYNAMIC to 1 and use
	plugin/include/cuda as include dir and -ldl instead of -lcuda as
	library to link ptx plugin against.
	* plugin/plugin-nvptx.c: Include dlfcn.h if PLUGIN_NVPTX_DYNAMIC.
	(CUDA_CALLS): Define.
	(cuda_lib, cuda_lib_inited): New variables.
	(init_cuda_lib): New function.
	(CUDA_CALL_PREFIX): Define.
	(CUDA_CALL_ERET, CUDA_CALL_ASSERT): Use CUDA_CALL_PREFIX.
	(CUDA_CALL): Use FN instead of (FN).
	(CUDA_CALL_NOCHECK): Define.
	(cuda_error, fini_streams_for_device, select_stream_for_async,
	nvptx_attach_host_thread_to_device, nvptx_open_device, link_ptx,
	event_gc, nvptx_exec, nvptx_async_test, nvptx_async_test_all,
	nvptx_wait_all, nvptx_set_clocktick, GOMP_OFFLOAD_unload_image,
	nvptx_stacks_alloc, nvptx_stacks_free, GOMP_OFFLOAD_run): Use
	CUDA_CALL_NOCHECK.
	(nvptx_init): Call init_cuda_lib, if it fails, return false.  Use
	CUDA_CALL_NOCHECK.
	(nvptx_get_num_devices): Call init_cuda_lib, if it fails, return 0.
	Use CUDA_CALL_NOCHECK.
	* plugin/cuda/cuda.h: New file.
	* config.h.in: Regenerated.
	* configure: Regenerated.

From-SVN: r244522
2017-01-17 10:44:17 +01:00
Jakub Jelinek
cbe34bb5ed Update copyright years.
From-SVN: r243994
2017-01-01 13:07:43 +01:00
Alexander Monakov
6103184e81 OpenMP offloading to NVPTX: libgomp changes
* Makefile.am (libgomp_la_SOURCES): Add atomic.c, icv.c, icv-device.c.
	* Makefile.in. Regenerate.
	* configure.ac [nvptx*-*-*] (libgomp_use_pthreads): Set and use it...
	(LIBGOMP_USE_PTHREADS): ...here; new define.
	* configure: Regenerate.
	* config.h.in: Likewise.
	* config/posix/affinity.c: Move to...
	* affinity.c: ...here (new file).  Guard use of Pthreads-specific
	interface by LIBGOMP_USE_PTHREADS. 
	* critical.c: Split out GOMP_atomic_{start,end} into...
	* atomic.c: ...here (new file).
	* env.c: Split out ICV definitions into...
	* icv.c: ...here (new file) and...
	* icv-device.c: ...here. New file.
	* config/linux/lock.c (gomp_init_lock_30): Move to generic lock.c.
	(gomp_destroy_lock_30): Ditto.
	(gomp_set_lock_30): Ditto.
	(gomp_unset_lock_30): Ditto.
	(gomp_test_lock_30): Ditto.
	(gomp_init_nest_lock_30): Ditto.
	(gomp_destroy_nest_lock_30): Ditto.
	(gomp_set_nest_lock_30): Ditto.
	(gomp_unset_nest_lock_30): Ditto.
	(gomp_test_nest_lock_30): Ditto.
	* lock.c: New.
	* config/nvptx/lock.c: New.
	* config/nvptx/bar.c: New.
	* config/nvptx/bar.h: New.
	* config/nvptx/doacross.h: New.
	* config/nvptx/error.c: New.
	* config/nvptx/icv-device.c: New.
	* config/nvptx/mutex.h: New.
	* config/nvptx/pool.h: New.
	* config/nvptx/proc.c: New.
	* config/nvptx/ptrlock.h: New.
	* config/nvptx/sem.h: New.
	* config/nvptx/simple-bar.h: New.
	* config/nvptx/target.c: New.
	* config/nvptx/task.c: New.
	* config/nvptx/team.c: New.
	* config/nvptx/time.c: New.
	* config/posix/simple-bar.h: New.
	* libgomp.h: Guard pthread.h inclusion.  Include simple-bar.h.
	(gomp_num_teams_var): Declare.
	(struct gomp_thread_pool): Change threads_dock member to
	gomp_simple_barrier_t.
	[__nvptx__] (gomp_thread): New implementation.
	(gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS.
	(gomp_thread_destructor): Ditto.
	(gomp_init_thread_affinity): Ditto.
	* team.c: Guard uses of Pthreads-specific interfaces by
	LIBGOMP_USE_PTHREADS.  Adjust all uses of threads_dock.
	(gomp_free_thread) [__nvptx__]: Do not call 'free'.

	* config/nvptx/alloc.c: Delete.
	* config/nvptx/barrier.c: Ditto.
	* config/nvptx/fortran.c: Ditto.
	* config/nvptx/iter.c: Ditto.
	* config/nvptx/iter_ull.c: Ditto.
	* config/nvptx/loop.c: Ditto.
	* config/nvptx/loop_ull.c: Ditto.
	* config/nvptx/ordered.c: Ditto.
	* config/nvptx/parallel.c: Ditto.
	* config/nvptx/priority_queue.c: Ditto.
	* config/nvptx/sections.c: Ditto.
	* config/nvptx/single.c: Ditto.
	* config/nvptx/splay-tree.c: Ditto.
	* config/nvptx/work.c: Ditto.

	* testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass
	-foffload=-lgfortran in addition to -lgfortran.
	* testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto.

	* plugin/plugin-nvptx.c: Include <limits.h>.
	(struct targ_fn_descriptor): Add new fields.
	(struct ptx_device): Ditto.  Set them...
	(nvptx_open_device): ...here.
	(nvptx_adjust_launch_bounds): New.
	(nvptx_host2dev): Allow NULL 'nvthd'.
	(nvptx_dev2host): Ditto.
	(GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400.
	(link_ptx): Adjust log sizes.
	(nvptx_host2dev): Allow NULL 'nvthd'.
	(nvptx_dev2host): Ditto.
	(nvptx_set_clocktick): New.  Use it...
	(GOMP_OFFLOAD_load_image): ...here.  Set new targ_fn_descriptor
	fields.
	(GOMP_OFFLOAD_dev2dev): New.
	(nvptx_adjust_launch_bounds): New.
	(nvptx_stacks_size): New.
	(nvptx_stacks_alloc): New.
	(nvptx_stacks_free): New.
	(GOMP_OFFLOAD_run): New.
	(GOMP_OFFLOAD_async_run): New (stub).

Co-Authored-By: Dmitry Melnik <dm@ispras.ru>
Co-Authored-By: Jakub Jelinek <jakub@redhat.com>

From-SVN: r242789
2016-11-23 21:36:41 +03:00
Martin Liska
b8d89b03db Remove build dependence on HSA run-time
2016-11-23  Martin Liska  <mliska@suse.cz>
            Martin Jambor  <mjambor@suse.cz>

gcc/
	* doc/install.texi: Remove entry about --with-hsa-kmt-lib.

libgomp/
	* plugin/hsa.h: New file.
	* plugin/hsa_ext_finalize.h: New file.
	* plugin/configfrag.ac: Remove hsa-kmt-lib test.  Added checks for
	header file unistd.h, and functions secure_getenv, __secure_getenv,
	getuid, geteuid, getgid and getegid.
	* plugin/Makefrag.am (libgomp_plugin_hsa_la_CPPFLAGS): Added
	-D_GNU_SOURCE.
	* plugin/plugin-hsa.c: Include config.h, inttypes.h and stdbool.h.
	Handle various cases of secure_getenv presence, add an implementation
	when we can test effective UID and GID.
	(struct hsa_runtime_fn_info): New structure.
	(hsa_runtime_fn_info hsa_fns): New variable.
	(hsa_runtime_lib): Likewise.
	(support_cpu_devices): Likewise.
	(init_enviroment_variables): Load newly introduced ENV
	variables.
	(hsa_warn): Call hsa run-time functions via hsa_fns structure.
	(hsa_fatal): Likewise.
	(DLSYM_FN): New macro.
	(init_hsa_runtime_functions): New function.
	(suitable_hsa_agent_p): Call hsa run-time functions via hsa_fns
	structure.  Depending on environment, also allow CPU devices.
	(init_hsa_context): Call hsa run-time functions via hsa_fns structure.
	(get_kernarg_memory_region): Likewise.
	(GOMP_OFFLOAD_init_device): Likewise.
	(destroy_hsa_program): Likewise.
	(init_basic_kernel_info): New function.
	(GOMP_OFFLOAD_load_image): Use it.
	(create_and_finalize_hsa_program): Call hsa run-time functions via
	hsa_fns structure.
	(create_single_kernel_dispatch): Likewise.
	(release_kernel_dispatch): Likewise.
	(init_single_kernel): Likewise.
	(parse_target_attributes): Allow up multiple HSA grid dimensions.
	(get_group_size): New function.
	(run_kernel): Likewise.
	(GOMP_OFFLOAD_run): Outline most functionality to run_kernel.
	(GOMP_OFFLOAD_fini_device): Call hsa run-time functions via hsa_fns
	structure.
	* testsuite/lib/libgomp.exp: Remove hsa_kmt_lib support.
	* testsuite/libgomp-test-support.exp.in: Likewise.
	* Makefile.in: Regenerated.
	* aclocal.m4: Likewise.
	* config.h.in: Likewise.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.



Co-Authored-By: Martin Jambor <mjambor@suse.cz>

From-SVN: r242749
2016-11-23 13:27:13 +01:00
Cesar Philippidis
6668eb4593 nvptx.c (PTX_GANG_DEFAULT): Set to zero.
gcc/
	* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Set to zero.

	libgomp/
	* plugin/plugin-nvptx.c (nvptx_exec): Interrogate board attributes
	to determine default geometry.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Set gang
	dimension.


Co-Authored-By: Nathan Sidwell <nathan@acm.org>

From-SVN: r241803
2016-11-02 15:10:02 -07:00
Chung-Lin Tang
b4557008c4 oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
2016-05-26  Chung-Lin Tang  <cltang@codesourcery.com>

	libgomp/
	* oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
	* oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
	parameter, use to set async stream around call to gomp_unmap_vars,
	call gomp_unmap_vars() with 'do_copyfrom' set to true.
	* plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
	(event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP
	events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
	(event_add): Add int parameter, initialize 'val' field when
	adding new ptx_event struct.
	(nvptx_evec): Adjust event_add() call arguments.
	(nvptx_host2dev): Likewise.
	(nvptx_dev2host): Likewise.
	(nvptx_wait_async): Likewise.
	(nvptx_wait_all_async): Likewise.
	(GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
	pass to event_add() call.
	* oacc-host.c (host_openacc_register_async_cleanup): Add 'int async'
	parameter.
	* oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
	call openacc.register_async_cleanup_func() hook.
	* oacc-parallel.c (GOACC_parallel_keyed): Likewise.
	* target.c (gomp_copy_from_async): Delete function.
	(gomp_map_vars): Remove async_refcount.
	(gomp_unmap_vars): Likewise.
	(gomp_load_image_to_device): Likewise.
	(omp_target_associate_ptr): Likewise.
	* libgomp.h (struct splay_tree_key_s): Remove async_refcount.
	(acc_dispatch_t.register_async_cleanup_func): Add int parameter.
	(gomp_copy_from_async): Remove.

From-SVN: r236772
2016-05-26 13:28:25 +00:00
Chung-Lin Tang
6ce1307231 target.c (gomp_device_copy): New function.
libgomp/
2016-05-26  Chung-Lin Tang  <cltang@codesourcery.com>

	* target.c (gomp_device_copy): New function.
	(gomp_copy_host2dev): Likewise.
	(gomp_copy_dev2host): Likewise.
	(gomp_free_device_memory): Likewise.
	(gomp_map_vars_existing): Adjust to call gomp_copy_host2dev.
	(gomp_map_pointer): Likewise.
	(gomp_map_vars): Adjust to call gomp_copy_host2dev, handle
	NULL value from alloc_func plugin hook.
	(gomp_unmap_tgt): Adjust to call gomp_free_device_memory.
	(gomp_copy_from_async): Adjust to call gomp_copy_dev2host.
	(gomp_unmap_vars): Likewise.
	(gomp_update): Adjust to call gomp_copy_dev2host and
	gomp_copy_host2dev functions.
	(gomp_unload_image_from_device): Handle false value from
	unload_image_func plugin hook.
	(gomp_init_device): Handle false value from init_device_func
	plugin hook.
	(gomp_exit_data): Adjust to call gomp_copy_dev2host.
	(omp_target_free): Adjust to call gomp_free_device_memory.
	(omp_target_memcpy): Handle return values from host2dev_func,
	dev2host_func, and dev2dev_func plugin hooks.
	(omp_target_memcpy_rect_worker): Likewise.
	(gomp_target_fini): Handle false value from fini_device_func
	plugin hook.
	* libgomp.h (struct gomp_device_descr): Adjust return type of
	init_device_func, fini_device_func, unload_image_func, free_func,
	dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'.
	* oacc-init.c (acc_shutdown_1): Handle false value from
	fini_device_func plugin hook.
	* oacc-host.c (host_init_device): Change return type to bool.
	(host_fini_device): Likewise.
	(host_unload_image): Likewise.
	(host_free): Likewise.
	(host_dev2host): Likewise.
	(host_host2dev): Likewise.
	* oacc-mem.c (acc_free): Handle plugin hook fatal error case.
	(acc_memcpy_to_device): Likewise.
	(acc_memcpy_from_device): Likewise.
	(delete_copyout): Add libfnname parameter, handle free_func
	hook fatal error case.
	(acc_delete): Adjust delete_copyout call.
	(acc_copyout): Likewise.
	(update_dev_host): Move gomp_mutex_unlock to after
	host2dev/dev2host hook calls.

	* plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable
	to 'hsa_error_msg', for clarity.
	(hsa_fatal): Likewise.
	(hsa_error): New function.
	(init_hsa_context): Change return type to bool, adjust to return
	false on error.
	(GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context
	return value.
	(GOMP_OFFLOAD_init_device): Change return type to bool, adjust to
	return false on error.
	(get_agent_info): Adjust to return NULL on error.
	(destroy_hsa_program): Change return type to bool, adjust to
	return false on error.
	(GOMP_OFFLOAD_load_image): Adjust to return -1 on error.
	(destroy_module): Change return type to bool, adjust to
	return false on error.
	(GOMP_OFFLOAD_unload_image): Likewise.
	(GOMP_OFFLOAD_fini_device): Likewise.
	(GOMP_OFFLOAD_alloc): Change to return NULL when called.
	(GOMP_OFFLOAD_free): Change to return false when called.
	(GOMP_OFFLOAD_dev2host): Likewise.
	(GOMP_OFFLOAD_host2dev): Likewise.
	(GOMP_OFFLOAD_dev2dev): Likewise.

	* plugin/plugin-nvptx.c (CUDA_CALL_ERET): New convenience macro.
	(CUDA_CALL): Likewise.
	(CUDA_CALL_ASSERT): Likewise.
	(map_init): Change return type to bool, use CUDA_CALL* macros.
	(map_fini): Likewise.
	(init_streams_for_device): Change return type to bool, adjust
	call to map_init.
	(fini_streams_for_device): Change return type to bool, adjust
	call to map_fini.
	(select_stream_for_async): Release stream_lock before calls to
	GOMP_PLUGIN_fatal, adjust call to map_init.
	(nvptx_init): Use CUDA_CALL* macros.
	(nvptx_attach_host_thread_to_device): Change return type to bool,
	use CUDA_CALL* macros.
	(nvptx_open_device): Use CUDA_CALL* macros.
	(nvptx_close_device): Change return type to bool, use CUDA_CALL*
	macros.
	(nvptx_get_num_devices): Use CUDA_CALL* macros.
	(link_ptx): Change return type to bool, use CUDA_CALL* macros.
	(nvptx_exec): Use CUDA_CALL* macros.
	(nvptx_alloc): Use CUDA_CALL* macros.
	(nvptx_free): Change return type to bool, use CUDA_CALL* macros.
	(nvptx_host2dev): Likewise.
	(nvptx_dev2host): Likewise.
	(nvptx_wait): Use CUDA_CALL* macros.
	(nvptx_wait_async): Likewise.
	(nvptx_wait_all): Likewise.
	(nvptx_wait_all_async): Likewise.
	(nvptx_set_cuda_stream): Adjust order of stream_lock acquire,
	use CUDA_CALL* macros, adjust call to map_fini.
	(GOMP_OFFLOAD_init_device): Change return type to bool,
	adjust code accordingly.
	(GOMP_OFFLOAD_fini_device): Likewise.
	(GOMP_OFFLOAD_load_image): Adjust calls to
	nvptx_attach_host_thread_to_device/link_ptx to handle errors,
	use CUDA_CALL* macros.
	(GOMP_OFFLOAD_unload_image): Change return type to bool, adjust
	return code.
	(GOMP_OFFLOAD_alloc): Adjust calls to code to handle error return.
	(GOMP_OFFLOAD_free): Change return type to bool, adjust calls to
	handle error return.
	(GOMP_OFFLOAD_dev2host): Likewise.
	(GOMP_OFFLOAD_host2dev): Likewise.
	(GOMP_OFFLOAD_openacc_register_async_cleanup): Use CUDA_CALL* macros.
	(GOMP_OFFLOAD_openacc_create_thread_data): Likewise.

liboffloadmic/
2016-05-26  Chung-Lin Tang  <cltang@codesourcery.com>

	* plugin/libgomp-plugin-intelmic.cpp (offload): Change return type
	to bool, adjust return code.
	(GOMP_OFFLOAD_init_device): Likewise.
	(GOMP_OFFLOAD_fini_device): Likewise.
	(get_target_table): Likewise.
	(offload_image): Likwise.
	(GOMP_OFFLOAD_load_image): Adjust call to offload_image(), change
	to return -1 on error.
	(GOMP_OFFLOAD_unload_image): Change return type to bool, adjust return
	code.
	(GOMP_OFFLOAD_alloc): Likewise.
	(GOMP_OFFLOAD_free): Likewise.
	(GOMP_OFFLOAD_host2dev): Likewise.
	(GOMP_OFFLOAD_dev2host): Likewise.
	(GOMP_OFFLOAD_dev2dev): Likewise.

From-SVN: r236768
2016-05-26 09:58:56 +00:00
Alexander Monakov
c2bd3b6911 libgomp nvptx plugin: make cuMemFreeHost error non-fatal
From-SVN: r235339
2016-04-21 16:11:47 +03:00
Martin Liska
f9c8babbab Properly assign to packet header (PR hsa/70394)
* plugin/plugin-hsa.c (packet_store_release): New function
	that is taken from the HSA runtime manual.
	(GOMP_OFFLOAD_run): Use the function.

From-SVN: r234454
2016-03-24 13:04:12 +00:00
Martin Liska
7397fce2f7 Copy shadow argument conditionally (PR hsa/70337)
PR hsa/70337
	* plugin/plugin-hsa.c (GOMP_OFFLOAD_run): Copy shadow
	argument just in case a dispatched kernel uses that argument.

From-SVN: r234418
2016-03-23 09:59:51 +00:00
Thomas Schwinge
f99c355797 Use plain -fopenacc to enable OpenACC kernels processing
gcc/
	* tree-parloops.c (create_parallel_loop, gen_parallel_loop)
	(parallelize_loops): In OpenACC kernels mode, set n_threads to
	zero.
	(pass_parallelize_loops::gate): In OpenACC kernels mode, gate on
	flag_openacc.
	* tree-ssa-loop.c (gate_oacc_kernels): Likewise.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Adjust
	to -ftree-parallelize-loops/-fopenacc changes.
	* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
	* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
	* c-c++-common/goacc/kernels-loop.c: Likewise.
	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
	* c-c++-common/goacc/kernels-reduction.c: Likewise.
	* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
	* gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise.
	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Initialize dims.
	* plugin/plugin-nvptx.c (nvptx_exec): Provide default values for
	dims.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Adjust to
	-ftree-parallelize-loops/-fopenacc changes.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c:
	Likewise.

From-SVN: r233634
2016-02-23 16:07:54 +01:00
Thomas Schwinge
033ff3d130 libgomp: Use HSA_RUNTIME_LIB, HSA_KMT_LIB in the testsuite
libgomp/
	* plugin/configfrag.ac (HSA_KMT_LIB, HSA_KMT_LDFLAGS): New
	variables.
	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib)
	(hsa_kmt_lib): Set variables.
	* testsuite/lib/libgomp.exp (libgomp_init): Use them to amend
	always_ld_library_path.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.

From-SVN: r233072
2016-02-02 13:48:31 +01:00
Thomas Schwinge
4a88d9b77a libgomp: For hsa offloading, compilation is all handled by the target compiler
libgomp/
	* plugin/configfrag.ac (offload_additional_options)
	(offload_additional_lib_paths): Don't amend for hsa offloading.
	* configure: Regenerate.

From-SVN: r233071
2016-02-02 13:48:21 +01:00
Thomas Schwinge
41d809d3c8 libgomp: Don't configure for offloading target if we don't build the corresponding plugin
libgomp/
	* plugin/configfrag.ac: Don't configure for offloading target if
	we don't build the corresponding plugin.
	* configure: Regenerate.

From-SVN: r233070
2016-02-02 13:48:04 +01:00
Martin Jambor
b2b4005150 Merge of HSA
2016-01-19  Martin Jambor  <mjambor@suse.cz>
	    Martin Liska  <mliska@suse.cz>
	    Michael Matz <matz@suse.de>

libgomp/
	* plugin/Makefrag.am: Add HSA plugin requirements.
	* plugin/configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
	(HSA_RUNTIME_LIB): Likewise.
	(HSA_RUNTIME_CPPFLAGS): Likewise.
	(HSA_RUNTIME_INCLUDE): New substitution.
	(HSA_RUNTIME_LIB): Likewise.
	(HSA_RUNTIME_LDFLAGS): Likewise.
	(hsa-runtime): New configure option.
	(hsa-runtime-include): Likewise.
	(hsa-runtime-lib): Likewise.
	(PLUGIN_HSA): New substitution variable.
	Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
	configure options.
	(PLUGIN_HSA_CPPFLAGS): Likewise.
	(PLUGIN_HSA_LDFLAGS): Likewise.
	(PLUGIN_HSA_LIBS): Likewise.
	Check that we have access to HSA run-time.
	* libgomp-plugin.h (offload_target_type): New element
	OFFLOAD_TARGET_TYPE_HSA.
	* libgomp.h (gomp_target_task): New fields firstprivate_copies and
	args.
	(bool gomp_create_target_task): Updated.
	(gomp_device_descr): Extra parameter of run_func and async_run_func,
	new field can_run_func.
	* libgomp_g.h (GOMP_target_ext): Update prototype.
	* oacc-host.c (host_run): Added a new parameter args.
	* target.c (calculate_firstprivate_requirements): New function.
	(copy_firstprivate_data): Likewise.
	(gomp_target_fallback_firstprivate): Use them.
	(gomp_target_unshare_firstprivate): New function.
	(gomp_get_target_fn_addr): Allow returning NULL for shared memory
	devices.
	(GOMP_target): Do host fallback for all shared memory devices.  Do not
	pass any args to plugins.
	(GOMP_target_ext): Introduce device-specific argument parameter args.
	Allow host fallback if device shares memory.  Do not remap data if
	device has shared memory.
	(gomp_target_task_fn): Likewise.  Also treat shared memory devices
	like host fallback for mappings.
	(GOMP_target_data): Treat shared memory devices like host fallback.
	(GOMP_target_data_ext): Likewise.
	(GOMP_target_update): Likewise.
	(GOMP_target_update_ext): Likewise.  Also pass NULL as args to
	gomp_create_target_task.
	(GOMP_target_enter_exit_data): Likewise.
	(omp_target_alloc): Treat shared memory devices like host fallback.
	(omp_target_free): Likewise.
	(omp_target_is_present): Likewise.
	(omp_target_memcpy): Likewise.
	(omp_target_memcpy_rect): Likewise.
	(omp_target_associate_ptr): Likewise.
	(gomp_load_plugin_for_device): Also load can_run.
	* task.c (GOMP_PLUGIN_target_task_completion): Free
	firstprivate_copies.
	(gomp_create_target_task): Accept new argument args and store it to
	ttask.
	* plugin/plugin-hsa.c: New file.

gcc/
	* Makefile.in (OBJS): Add new source files.
	(GTFILES): Add hsa.c.
	* common.opt (disable_hsa): New variable.
	(-Whsa): New warning.
	* config.in (ENABLE_HSA): New.
	* configure.ac: Treat hsa differently from other accelerators.
	(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
	$enable_offloading.
	(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
	* doc/install.texi (Configuration): Document --with-hsa-runtime,
	--with-hsa-runtime-include, --with-hsa-runtime-lib and
	--with-hsa-kmt-lib.
	* doc/invoke.texi (-Whsa): Document.
	(hsa-gen-debug-stores): Likewise.
	* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
	to invoke offload compiler for hsa acclerator.
	* opts.c (common_handle_option): Determine whether HSA offloading
	should be performed.
	* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.
	* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
	* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GRID_BODY.
	* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
	GF_OMP_FOR_KIND_GRID_LOOP.
	(dump_gimple_omp_block): Also handle GIMPLE_OMP_GRID_BODY.
	(pp_gimple_stmt_1): Likewise.
	* gimple-walk.c (walk_gimple_stmt): Likewise.
	* gimple.c (gimple_build_omp_grid_body): New function.
	(gimple_copy): Also handle GIMPLE_OMP_GRID_BODY.
	* gimple.def (GIMPLE_OMP_GRID_BODY): New.
	* gimple.h (enum gf_mask): Added GF_OMP_PARALLEL_GRID_PHONY,
	GF_OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY and
	GF_OMP_TEAMS_GRID_PHONY.
	(gimple_statement_omp_single_layout): Updated comments.
	(gimple_build_omp_grid_body): New function.
	(gimple_has_substatements): Also handle GIMPLE_OMP_GRID_BODY.
	(gimple_omp_for_grid_phony): New function.
	(gimple_omp_for_set_grid_phony): Likewise.
	(gimple_omp_parallel_grid_phony): Likewise.
	(gimple_omp_parallel_set_grid_phony): Likewise.
	(gimple_omp_teams_grid_phony): Likewise.
	(gimple_omp_teams_set_grid_phony): Likewise.
	(gimple_return_set_retbnd): Also handle GIMPLE_OMP_GRID_BODY.
	* omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New.
	(BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise.
	(BUILT_IN_GOMP_TARGET): Updated type.
	* omp-low.c: Include symbol-summary.h, hsa.h and params.h.
	(adjust_for_condition): New function.
	(get_omp_for_step_from_incr): Likewise.
	(extract_omp_for_data): Moved parts to adjust_for_condition and
	get_omp_for_step_from_incr.
	(build_outer_var_ref): Handle GIMPLE_OMP_GRID_BODY.
	(fixup_child_record_type): Bail out if receiver_decl is NULL.
	(scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
	(scan_omp_parallel): Do not create child functions for phony
	constructs.
	(check_omp_nesting_restrictions): Handle GIMPLE_OMP_GRID_BODY.
	(scan_omp_1_op): Checking assert we are not remapping to
	ERROR_MARK.  Also also handle GIMPLE_OMP_GRID_BODY.
	(parallel_needs_hsa_kernel_p): New function.
	(expand_parallel_call): Register apprpriate parallel child
	functions as HSA kernels.
	(grid_launch_attributes_trees): New type.
	(grid_attr_trees): New variable.
	(grid_create_kernel_launch_attr_types): New function.
	(grid_insert_store_range_dim): Likewise.
	(grid_get_kernel_launch_attributes): Likewise.
	(get_target_argument_identifier_1): Likewise.
	(get_target_argument_identifier): Likewise.
	(get_target_argument_value): Likewise.
	(push_target_argument_according_to_value): Likewise.
	(get_target_arguments): Likewise.
	(expand_omp_target): Call get_target_arguments instead of looking
	up for teams and thread limit.
	(grid_expand_omp_for_loop): New function.
	(grid_arg_decl_map): New type.
	(grid_remap_kernel_arg_accesses): New function.
	(grid_expand_target_kernel_body): New function.
	(expand_omp): Call it.
	(lower_omp_for): Do not emit phony constructs.
	(lower_omp_taskreg): Do not emit phony constructs but create for them
	a temporary variable receiver_decl.
	(lower_omp_taskreg): Do not emit phony constructs.
	(lower_omp_teams): Likewise.
	(lower_omp_grid_body): New function.
	(lower_omp_1): Call it.
	(grid_reg_assignment_to_local_var_p): New function.
	(grid_seq_only_contains_local_assignments): Likewise.
	(grid_find_single_omp_among_assignments_1): Likewise.
	(grid_find_single_omp_among_assignments): Likewise.
	(grid_find_ungridifiable_statement): Likewise.
	(grid_target_follows_gridifiable_pattern): Likewise.
	(grid_remap_prebody_decls): Likewise.
	(grid_copy_leading_local_assignments): Likewise.
	(grid_process_kernel_body_copy): Likewise.
	(grid_attempt_target_gridification): Likewise.
	(grid_gridify_all_targets_stmt): Likewise.
	(grid_gridify_all_targets): Likewise.
	(execute_lower_omp): Call grid_gridify_all_targets.
	(make_gimple_omp_edges): Handle GIMPLE_OMP_GRID_BODY.
	* tree-core.h (omp_clause_code): Added OMP_CLAUSE__GRIDDIM_.
	(tree_omp_clause): Added union field dimension.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_.
	* tree.c (omp_clause_num_ops): Added number of arguments of
	OMP_CLAUSE__GRIDDIM_.
	(omp_clause_code_name): Added name of OMP_CLAUSE__GRIDDIM_.
	(walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_.
	* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New.
	(OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise.
	(OMP_CLAUSE_GRIDDIM_SIZE): Likewise.
	(OMP_CLAUSE_GRIDDIM_GROUP): Likewise.
	* passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
	* tree-pass.h (make_pass_gen_hsail): Declare.
	(make_pass_ipa_hsa): Likewise.
	* ipa-hsa.c: New file.
	* lto-section-in.c (lto_section_name): Add hsa section name.
	* lto-streamer.h (lto_section_type): Add hsa section.
	* timevar.def (TV_IPA_HSA): New.
        * hsa-brig-format.h: New file.
	* hsa-brig.c: New file.
	* hsa-dump.c: Likewise.
	* hsa-gen.c: Likewise.
	* hsa.c: Likewise.
	* hsa.h: Likewise.
	* toplev.c (compile_file): Call hsa_output_brig.
	* hsa-regalloc.c: New file.

gcc/fortran/
	* types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.

gcc/lto/
	* lto-partition.c: Include "hsa.h"
	(add_symbol_to_partition_1): Put hsa implementations into the
	same partition as host implementations.

liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New
	unused parameter.
	(GOMP_OFFLOAD_run): Likewise.

include/
	* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
	(GOMP_VERSION_HSA): Likewise.
	(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
	(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
	(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
	(GOMP_TARGET_ARG_ID_MASK): Likewise.
	(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
	(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
	(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
	(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.

From-SVN: r232549
2016-01-19 11:35:10 +01:00
Alexander Monakov
0d58938ed7 nvptx plugin: do not force JIT target SM version
When link_ptx runs, a CUDA device is already bound to current thread, so the
driver library knows the target architecture.  There isn't any benefit from
forcing a specific target here; on the contrary, hardcoding sm_30 breaks
offloading on later (Maxwell, sm_5x) devices.
    
	* plugin/plugin-nvptx.c (link_ptx): Do not set CU_JIT_TARGET.

From-SVN: r232227
2016-01-11 15:55:31 +03:00
Jakub Jelinek
818ab71a41 Update copyright years.
From-SVN: r232055
2016-01-04 15:30:50 +01:00
Nathan Sidwell
5c06742f6f libgomp.h (struct acc_dispatch_t): Remove args from exec_func.
* libgomp.h (struct acc_dispatch_t): Remove args from exec_func.
	* plugin/plugin-nvptx.c (nvptx_exec): Remove sizes & kinds arg.
	(GOMP_OFFLOAD_openacc_parallel): Likewise.
	* oacc-host.c (host_openacc_exec): Likewise.
	* oacc-parallel.c (GOACC_parallel_keyed): Adjust exec_func call.

From-SVN: r229721
2015-11-03 20:18:33 +00:00
Nathan Sidwell
a1c1908bbd plugin-nvptx.c (nvptx_exec): Remove check on compute dimensions.
* plugin/plugin-nvptx.c (nvptx_exec): Remove check on compute
	dimensions.

From-SVN: r229471
2015-10-28 03:00:10 +00:00