OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Jakub Jelinek	b5c26449f3	re PR libgomp/90585 (libgomp hsa plugin ftbfs in the x32 multilib variant) PR libgomp/90585 * plugin/plugin-hsa.c: Include gstdint.h. Include inttypes.h only if HAVE_INTTYPES_H is defined. (print_uint64_t): New typedef. (PRIu64): Define if HAVE_INTTYPES_H is not defined. (print_kernel_dispatch, run_kernel): Use PRIu64 macro instead of "lu", cast uint64_t HSA_DEBUG and fprintf arguments to print_uint64_t. (release_kernel_dispatch): Likewise. Cast shadow->debug to uintptr_t before casting to void . plugin/plugin-nvptx.c: Include gstdint.h instead of stdint.h. * oacc-mem.c: Don't include config.h nor stdint.h. * target.c: Don't include config.h. * oacc-cuda.c: Likewise. * oacc-host.c: Don't include stdint.h. From-SVN: r271597	2019-05-24 10:59:37 +02:00
Thomas Schwinge	5fae049dc2	OpenACC Profiling Interface (incomplete) libgomp/ * acc_prof.h: New file. * oacc-profiling.c: Likewise. * Makefile.am (nodist_libsubinclude_HEADERS, libgomp_la_SOURCES): Add these, respectively. * Makefile.in: Regenerate. * env.c (initialize_env): Call goacc_profiling_initialize. * oacc-plugin.c (GOMP_PLUGIN_goacc_thread) (GOMP_PLUGIN_goacc_profiling_dispatch): New functions. * oacc-plugin.h (GOMP_PLUGIN_goacc_thread) (GOMP_PLUGIN_goacc_profiling_dispatch): Declare. * libgomp.map (OACC_2.5.1): Add acc_prof_lookup, acc_prof_register, acc_prof_unregister, and acc_register_library. (GOMP_PLUGIN_1.3): Add GOMP_PLUGIN_goacc_profiling_dispatch, and GOMP_PLUGIN_goacc_thread. * oacc-int.h (struct goacc_thread): Add prof_info, api_info, prof_callbacks_enabled members. (goacc_prof_enabled, goacc_profiling_initialize) (_goacc_profiling_dispatch_p, _goacc_profiling_setup_p) (goacc_profiling_dispatch): Declare. (GOACC_PROF_ENABLED, GOACC_PROFILING_DISPATCH_P) (GOACC_PROFILING_SETUP_P): Define. * oacc-async.c (acc_async_test, acc_async_test_all, acc_wait) (acc_wait_async, acc_wait_all, acc_wait_all_async): Update for OpenACC Profiling Interface. * oacc-cuda.c (acc_get_current_cuda_device) (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Likewise. * oacc-init.c (acc_init_1, goacc_attach_host_thread_to_device) (acc_init, acc_set_device_type, acc_get_device_type) (acc_get_device_num, goacc_lazy_initialize): Likewise. * oacc-mem.c (acc_malloc, acc_free, memcpy_tofrom_device) (acc_deviceptr, acc_hostptr, acc_is_present, acc_map_data) (acc_unmap_data, present_create_copy, delete_copyout) (update_dev_host): Likewise. * oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start) (GOACC_data_end, GOACC_enter_exit_data, GOACC_update, GOACC_wait): Likewise. * plugin/plugin-nvptx.c (nvptx_exec, nvptx_alloc, nvptx_free) (GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec): Likewise. * libgomp.texi: Update. * testsuite/libgomp.oacc-c-c++-common/acc_prof-dispatch-1.c: New file. * testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-valid_bytes-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-version-1.c: Likewise. From-SVN: r271346	2019-05-17 21:13:36 +02:00
Chung-Lin Tang	1f4c5b9bb2	2019-05-13 Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Thomas Schwinge <thomas@codesourcery.com> libgomp/ * libgomp-plugin.h (struct goacc_asyncqueue): Declare. (struct goacc_asyncqueue_list): Likewise. (goacc_aq): Likewise. (goacc_aq_list): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove. (GOMP_OFFLOAD_openacc_async_test): Remove. (GOMP_OFFLOAD_openacc_async_test_all): Remove. (GOMP_OFFLOAD_openacc_async_wait): Remove. (GOMP_OFFLOAD_openacc_async_wait_async): Remove. (GOMP_OFFLOAD_openacc_async_wait_all): Remove. (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove. (GOMP_OFFLOAD_openacc_async_set_async): Remove. (GOMP_OFFLOAD_openacc_exec): Adjust declaration. (GOMP_OFFLOAD_openacc_cuda_get_stream): Likewise. (GOMP_OFFLOAD_openacc_cuda_set_stream): Likewise. (GOMP_OFFLOAD_openacc_async_exec): Declare. (GOMP_OFFLOAD_openacc_async_construct): Declare. (GOMP_OFFLOAD_openacc_async_destruct): Declare. (GOMP_OFFLOAD_openacc_async_test): Declare. (GOMP_OFFLOAD_openacc_async_synchronize): Declare. (GOMP_OFFLOAD_openacc_async_serialize): Declare. (GOMP_OFFLOAD_openacc_async_queue_callback): Declare. (GOMP_OFFLOAD_openacc_async_host2dev): Declare. (GOMP_OFFLOAD_openacc_async_dev2host): Declare. * libgomp.h (struct acc_dispatch_t): Define 'async' sub-struct. (gomp_acc_insert_pointer): Adjust declaration. (gomp_copy_host2dev): New declaration. (gomp_copy_dev2host): Likewise. (gomp_map_vars_async): Likewise. (gomp_unmap_tgt): Likewise. (gomp_unmap_vars_async): Likewise. (gomp_fini_device): Likewise. * oacc-async.c (get_goacc_thread): New function. (get_goacc_thread_device): New function. (lookup_goacc_asyncqueue): New function. (get_goacc_asyncqueue): New function. (acc_async_test): Adjust code to use new async design. (acc_async_test_all): Likewise. (acc_wait): Likewise. (acc_wait_async): Likewise. (acc_wait_all): Likewise. (acc_wait_all_async): Likewise. (goacc_async_free): New function. (goacc_init_asyncqueues): Likewise. (goacc_fini_asyncqueues): Likewise. * oacc-cuda.c (acc_get_cuda_stream): Adjust code to use new async design. (acc_set_cuda_stream): Likewise. * oacc-host.c (host_openacc_exec): Adjust parameters, remove 'async'. (host_openacc_register_async_cleanup): Remove. (host_openacc_async_exec): New function. (host_openacc_async_test): Adjust parameters. (host_openacc_async_test_all): Remove. (host_openacc_async_wait): Remove. (host_openacc_async_wait_async): Remove. (host_openacc_async_wait_all): Remove. (host_openacc_async_wait_all_async): Remove. (host_openacc_async_set_async): Remove. (host_openacc_async_synchronize): New function. (host_openacc_async_serialize): New function. (host_openacc_async_host2dev): New function. (host_openacc_async_dev2host): New function. (host_openacc_async_queue_callback): New function. (host_openacc_async_construct): New function. (host_openacc_async_destruct): New function. (struct gomp_device_descr host_dispatch): Remove initialization of old interface, add intialization of new async sub-struct. * oacc-init.c (acc_shutdown_1): Adjust to use gomp_fini_device. (goacc_attach_host_thread_to_device): Remove old async code usage. * oacc-int.h (goacc_init_asyncqueues): New declaration. (goacc_fini_asyncqueues): Likewise. (goacc_async_copyout_unmap_vars): Likewise. (goacc_async_free): Likewise. (get_goacc_asyncqueue): Likewise. (lookup_goacc_asyncqueue): Likewise. * oacc-mem.c (memcpy_tofrom_device): Adjust code to use new async design. (present_create_copy): Adjust code to use new async design. (delete_copyout): Likewise. (update_dev_host): Likewise. (gomp_acc_insert_pointer): Add async parameter, adjust code to use new async design. (gomp_acc_remove_pointer): Adjust code to use new async design. * oacc-parallel.c (GOACC_parallel_keyed): Adjust code to use new async design. (GOACC_enter_exit_data): Likewise. (goacc_wait): Likewise. (GOACC_update): Likewise. * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Change to assert fail when called, warn as obsolete in comment. * target.c (goacc_device_copy_async): New function. (gomp_copy_host2dev): Remove 'static', add goacc_asyncqueue parameter, add goacc_device_copy_async case. (gomp_copy_dev2host): Likewise. (gomp_map_vars_existing): Add goacc_asyncqueue parameter, adjust code. (gomp_map_pointer): Likewise. (gomp_map_fields_existing): Likewise. (gomp_map_vars_internal): New always_inline function, renamed from gomp_map_vars. (gomp_map_vars): Implement by calling gomp_map_vars_internal. (gomp_map_vars_async): Implement by calling gomp_map_vars_internal, passing goacc_asyncqueue argument. (gomp_unmap_tgt): Remove static, add attribute_hidden. (gomp_unref_tgt): New function. (gomp_unmap_vars_internal): New always_inline function, renamed from gomp_unmap_vars. (gomp_unmap_vars): Implement by calling gomp_unmap_vars_internal. (gomp_unmap_vars_async): Implement by calling gomp_unmap_vars_internal, passing goacc_asyncqueue argument. (gomp_fini_device): New function. (gomp_exit_data): Adjust gomp_copy_dev2host call. (gomp_load_plugin_for_device): Remove old interface, adjust to load new async interface. (gomp_target_fini): Adjust code to call gomp_fini_device. * plugin/plugin-nvptx.c (struct cuda_map): Remove. (struct ptx_stream): Remove. (struct nvptx_thread): Remove current_stream field. (cuda_map_create): Remove. (cuda_map_destroy): Remove. (map_init): Remove. (map_fini): Remove. (map_pop): Remove. (map_push): Remove. (struct goacc_asyncqueue): Define. (struct nvptx_callback): Define. (struct ptx_free_block): Define. (struct ptx_device): Remove null_stream, active_streams, async_streams, stream_lock, and next fields. (enum ptx_event_type): Remove. (struct ptx_event): Remove. (ptx_event_lock): Remove. (ptx_events): Remove. (init_streams_for_device): Remove. (fini_streams_for_device): Remove. (select_stream_for_async): Remove. (nvptx_init): Remove ptx_events and ptx_event_lock references. (nvptx_attach_host_thread_to_device): Remove CUDA_ERROR_NOT_PERMITTED case. (nvptx_open_device): Add free_blocks initialization, remove init_streams_for_device call. (nvptx_close_device): Remove fini_streams_for_device call, add free_blocks destruct code. (event_gc): Remove. (event_add): Remove. (nvptx_exec): Adjust parameters and code. (nvptx_free): Likewise. (nvptx_host2dev): Remove. (nvptx_dev2host): Remove. (nvptx_set_async): Remove. (nvptx_async_test): Remove. (nvptx_async_test_all): Remove. (nvptx_wait): Remove. (nvptx_wait_async): Remove. (nvptx_wait_all): Remove. (nvptx_wait_all_async): Remove. (nvptx_get_cuda_stream): Remove. (nvptx_set_cuda_stream): Remove. (GOMP_OFFLOAD_alloc): Adjust code. (GOMP_OFFLOAD_free): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove. (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code. (GOMP_OFFLOAD_openacc_async_test_all): Remove. (GOMP_OFFLOAD_openacc_async_wait): Remove. (GOMP_OFFLOAD_openacc_async_wait_async): Remove. (GOMP_OFFLOAD_openacc_async_wait_all): Remove. (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove. (GOMP_OFFLOAD_openacc_async_set_async): Remove. (cuda_free_argmem): New function. (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function. (GOMP_OFFLOAD_openacc_create_thread_data): Adjust code. (GOMP_OFFLOAD_openacc_cuda_get_stream): Adjust code. (GOMP_OFFLOAD_openacc_cuda_set_stream): Adjust code. (GOMP_OFFLOAD_openacc_async_construct): New plugin hook function. (GOMP_OFFLOAD_openacc_async_destruct): New plugin hook function. (GOMP_OFFLOAD_openacc_async_test): Remove and re-implement. (GOMP_OFFLOAD_openacc_async_synchronize): New plugin hook function. (GOMP_OFFLOAD_openacc_async_serialize): New plugin hook function. (GOMP_OFFLOAD_openacc_async_queue_callback): New plugin hook function. (cuda_callback_wrapper): New function. (cuda_memcpy_sanity_check): New function. (GOMP_OFFLOAD_host2dev): Remove and re-implement. (GOMP_OFFLOAD_dev2host): Remove and re-implement. (GOMP_OFFLOAD_openacc_async_host2dev): New plugin hook function. (GOMP_OFFLOAD_openacc_async_dev2host): New plugin hook function. From-SVN: r271128	2019-05-13 13:32:00 +00:00
Thomas Schwinge	0a0384b43a	[libgomp] In OpenACC testing, cycle though all offload targets ... instead of through offload plugins. libgomp/ * plugin/configfrag.ac: Populate and AC_SUBST offload_targets. * testsuite/libgomp-test-support.exp.in: Adjust. * testsuite/lib/libgomp.exp: Likewise. Don't populate openacc_device_types_s. (offload_target_to_openacc_device_type): New proc. * testsuite/libgomp.oacc-c++/c++.exp: Adjust. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise. From-SVN: r269108	2019-02-22 11:51:20 +01:00
Thomas Schwinge	ee332b4a9a	[libgomp] Clarify difference between offload target, offload plugin, and OpenACC device type libgomp/ * plugin/configfrag.ac: Populate and AC_SUBST offload_plugins instead of offload_targets, and AC_DEFINE_UNQUOTED OFFLOAD_PLUGINS instead of OFFLOAD_TARGETS. * target.c (gomp_target_init): Adjust. * testsuite/libgomp-test-support.exp.in: Likewise. * testsuite/lib/libgomp.exp: Likewise. Populate openacc_device_types_s instead of offload_targets_s_openacc. (check_effective_target_openacc_nvidia_accel_selected) (check_effective_target_openacc_host_selected): Adjust. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise. * Makefile.in: Regenerate. * config.h.in: Likewise. * configure: Likewise. * testsuite/Makefile.in: Likewise. From-SVN: r269107	2019-02-22 11:51:05 +01:00
Tom de Vries	738c56d410	[nvptx, libgomp] Fix memleak in GOMP_OFFLOAD_fini_device I wrote a test-case: ... int main (void) { for (unsigned i = 0; i < 128; ++i) { acc_init (acc_device_nvidia); acc_shutdown (acc_device_nvidia); } return 0; } ... and ran it under valgrind. The only leak location reported with a frequency of 128, was the allocation of ptx_devices in nvptx_init. Fix this by freeing ptx_devices in GOMP_OFFLOAD_fini_device, once instantiated_devices drops to 0. 2019-01-24 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (GOMP_OFFLOAD_fini_device): Free ptx_devices once instantiated_devices drops to 0. From-SVN: r268237	2019-01-24 14:12:19 +00:00
Tom de Vries	4a75460b00	[nvptx, libgomp] Fix cuMemAlloc with size zero Consider test-case: ... int main (void) { #pragma acc parallel async ; #pragma acc parallel async ; #pragma acc wait return 0; } ... This fails with: ... libgomp: cuMemAlloc error: invalid argument Segmentation fault (core dumped) ... The cuMemAlloc error is due to the fact that we're try to allocate 0 bytes. Fix this by preventing calling map_push with size zero argument in nvptx_exec. This also has the consequence that for the abort-1.c test-case, we end up calling cuMemFree during map_fini for the struct cuda_map allocated in map_init, which fails because an abort happened. Fix this by calling cuMemFree with CUDA_CALL_NOCHECK in cuda_map_destroy. 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/PR88946 * plugin/plugin-nvptx.c (cuda_map_destroy): Use CUDA_CALL_NOCHECK for cuMemFree. (nvptx_exec): Don't call map_push if mapnum == 0. * testsuite/libgomp.oacc-c-c++-common/pr88946.c: New test. From-SVN: r268178	2019-01-23 08:16:56 +00:00
Tom de Vries	4fef8e4d8c	[nvptx, libgomp] Fix assert (!s->map->active) in map_fini There are currently two situations where this assert triggers: ... libgomp/plugin/plugin-nvptx.c: map_fini: Assertion `!s->map->active' failed. ... First, in abort-1.c, a parallel region triggering an abort: ... int main (void) { #pragma acc parallel abort (); return 0; } ... The abort is detected in nvptx_exec as the CUDA_ERROR_ILLEGAL_INSTRUCTION return status of the cuStreamSynchronize call after kernel launch, which is then handled by calling non-returning function GOMP_PLUGIN_fatal. Consequently, the map_pop in nvptx_exec that in case of cuStreamSynchronize success would remove or inactive the element added by the map_push earlier in nvptx_exec, does not trigger. With the element no longer active, but still marked active and a member of s->map, we run into the assert during GOMP_OFFLOAD_fini_device, which is triggered from atexit handler gomp_target_fini (which is triggered by the GOMP_PLUGIN_fatal mentioned above calling exit). Second, in pr88941.c, an async parallel region without wait: ... int main (void) { #pragma acc parallel async ; /* no #pragma acc wait / return 0; } ... Because nvptx_exec is handling an async region, it does not call map_pop for the element added by map_push, but schedules an kernel execution completion event to call map_pop. Again, we run into the assert during GOMP_OFFLOAD_fini_device, which is triggered from atexit handler gomp_target_fini, but the exit in this case is triggered by returning from main. So either the kernel is still running, or the kernel has completed but the corresponding event that is supposed to call map_pop is stuck in the event queue, waiting for an event_gc. Fix this by removing the assert, and skipping the freeing of device memory if the map is still marked active (though in the async case, this is more a workaround than an fix). 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/88941 PR target/88939 plugin/plugin-nvptx.c (cuda_map_destroy): Handle map->active case. (map_fini): Remove "assert (!s->map->active)". * testsuite/libgomp.oacc-c-c++-common/pr88941.c: New test. From-SVN: r268177	2019-01-23 08:16:42 +00:00
Tom de Vries	2ee6cb22c1	[nvptx, libgomp] Fix map_push The map field of a struct ptx_stream is a FIFO. The FIFO is implemented as a single linked list, with pop-from-the-front semantics. The function map_pop pops an element, either by: - deallocating the element, if there is more than one element - or marking the element inactive, if there's only one element The responsibility of map_push is to push an element to the back, as well as selecting the element to push, by: - allocating an element, or - reusing the element at the front if inactive and big enough, or - dropping the element at the front if inactive and not big enough, and allocating one that's big enough The current implemention gets at least the first and most basic scenario wrong: > map = cuda_map_create (size); We create an element, and assign it to map. > for (t = s->map; t->next != NULL; t = t->next) > ; We determine the last element in the fifo. > t->next = map; We append the new element. > s->map = map; But here, we throw away the rest of the FIFO, and declare the FIFO to be just the new element. This problem causes the test-case asyncwait-1.c to fail intermittently on some systems. The pr87835.c test-case added here is a a minimized and modified version of asyncwait-1.c (avoiding the kernel construct) that is more likely to fail. Fix this by rewriting map_pop more robustly, by: - seperating the function in two phases: select element, push element - when reusing or dropping an element, making sure that the element is cleanly popped from the queue - rewriting the push element part in such a way that it can handle all cases without needing if statements, such that each line is exercised for each of the three cases. 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/87835 * plugin/plugin-nvptx.c (map_push): Fix adding of allocated element. * testsuite/libgomp.oacc-c-c++-common/pr87835.c: New test. From-SVN: r268176	2019-01-23 08:16:11 +00:00
Tom de Vries	2c2ff1684d	[nvptx] Enable setting vector length using -fopenacc-dim Enable setting vector length using -fopenacc-dim, f.i. -fopenacc-dim=::128. 2019-01-12 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.c (nvptx_goacc_validate_dims_1): Alow setting vector length using -fopenacc-dim. * plugin/plugin-nvptx.c (nvptx_exec): Update error message. From-SVN: r267896	2019-01-12 22:19:15 +00:00
Tom de Vries	52d22ece49	[nvptx] Update insufficient launch message for variable vector_length Update message in nvptx libgomp plugin about insufficient resources to launch kernel, to accommodate for the fact the vector_length can now be variable. 2019-01-12 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Update insufficient hardware resources diagnostic. From-SVN: r267890	2019-01-12 22:18:00 +00:00
Tom de Vries	052aaaceed	[nvptx] Don't allow vector_length 64 with num_workers 16 When using a compiler build with: ... +#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE ... consider a test-case: ... int main (void) { #pragma acc parallel vector_length (64) #pragma acc loop worker for (unsigned int i = 0; i < 32; i++) #pragma acc loop vector for (unsigned int j = 0; j < 64; j++) ; return 0; } ... If num_workers is 16, either because: - we add a "num_workers (16)" clause on the parallel directive, or - we set "GOMP_OPENACC_DIM=:16:", or - the libgomp plugin chooses 16 num_workers we run into an illegal instruction at runtime, because a bar.sync instruction tries to use a barrier 16. The instruction is illegal, because ptx supports only 16 barriers per CTA, and the valid range is 0..15. The problem is that with a warp-multiple vector length, we use a code generation scheme with a per-worker barrier. And because barrier zero is reserved for per-cta barrier, only the remaining 15 barriers can be used as per-worker barrier, and consequently we can't use num_workers larger than 15. This problem occurs only for vector_length 64. For vector_length 32, we use a different code generation scheme, and for vector_length >= 96, the maximum num_workers is not big enough not to trigger this problem. Also, this problem only occurs for num_workers 16. As explained above, num_workers 15 is safe to use, and 16 is already the maximum num_workers for vector_length 64. This patch fixes the problem in both the compiler (handling "num_workers (16)") and in the libgomp nvptx plugin (with and without "GOMP_OPENACC_DIM=:16:"). 2019-01-11 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.c (PTX_CTA_NUM_BARRIERS, PTX_PER_CTA_BARRIER) (PTX_NUM_PER_CTA_BARRIER, PTX_FIRST_PER_WORKER_BARRIER) (PTX_NUM_PER_WORKER_BARRIERS): Define. (nvptx_apply_dim_limits): Prevent vector_length 64 and num_workers 16. * plugin/plugin-nvptx.c (nvptx_exec): Prevent vector_length 64 and num_workers 16. From-SVN: r267838	2019-01-11 11:46:43 +00:00
Tom de Vries	2c372e81a9	[nvptx, libgomp] Don't launch with num_workers == 0 When using a compiler build with: ... +#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE +#define PTX_MAX_VECTOR_LENGTH PTX_CTA_SIZE ... and running the libgomp testsuite, we run into an execution failure in parallel-loop-1.c, due to a cuda launch failure: ... nvptx_exec: kernel f6_none_none$_omp_fn$0: launch gangs=480, workers=0, \ vectors=1024 libgomp: cuLaunchKernel error: invalid argument ... because workers == 0. The workers variable is set to 0 here in nvptx_exec: ... workers = blocks / actual_vectors; ... because actual_vectors is 1024, and blocks is 768: ... cuOccupancyMaxPotentialBlockSize: grid = 10, block = 768 ... Fix this by ensuring that workers is at least one. 2019-01-09 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Make sure to launch with at least one worker. From-SVN: r267746	2019-01-09 00:07:45 +00:00
Jakub Jelinek	a554497024	Update copyright years. From-SVN: r267494	2019-01-01 13:31:55 +01:00
Thomas Schwinge	f847198ec3	[PR88495] An OpenACC async queue is always synchronized with itself An OpenACC async queue is always synchronized with itself, so invocations like "#pragma acc wait(0) async(0)", or "acc_wait_async (0, 0)" don't make a lot of sense, but are still valid. libgomp/ PR libgomp/88495 * plugin/plugin-nvptx.c (nvptx_wait_async): Don't refuse "identical parameters". * testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/lib-80.c: Remove. From-SVN: r267152	2018-12-14 21:43:02 +01:00
Thomas Schwinge	1404af62dc	[PR88407] [OpenACC] Correctly handle unseen async-arguments ... which turn the operation into a no-op. libgomp/ PR libgomp/88407 * plugin/plugin-nvptx.c (nvptx_async_test, nvptx_wait) (nvptx_wait_async): Unseen async-argument is a no-op. * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise. * testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-71.c: Merge into... * testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this. Update. * testsuite/libgomp.oacc-c-c++-common/lib-77.c: Merge into... * testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this. Update From-SVN: r267150	2018-12-14 21:42:40 +01:00
Thomas Schwinge	18c247cc0b	[PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval Per my reading of the OpenACC specification (and as supported by secondary documentation, such as code examples, or presentations), it's valid to call "acc_get_cuda_stream"/"acc_set_cuda_stream" also with "acc_async_sync", "acc_async_noval" arguments, not just with the nonnegative values as currently implemented. libgomp/ PR libgomp/88370 * libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Clarify. * oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use "async_valid_p". * plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async == acc_async_sync". * testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file. * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update. * testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise. From-SVN: r267147	2018-12-14 21:42:08 +01:00
Cesar Philippidis	2049befdd0	[nvptx] Remove use of CUDA unified memory in libgomp libgomp/ * plugin/plugin-nvptx.c (struct cuda_map): New. (struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev, h_tail with (cuda_map *) map. (cuda_map_create): New function. (cuda_map_destroy): New function. (map_init): Update to use a linked list of cuda_map objects. (map_fini): Likewise. (map_pop): Likewise. (map_push): Likewise. Return CUdeviceptr instead of void. (init_streams_for_device): Remove stales references to ptx_stream members. (select_stream_for_async): Likewise. (nvptx_exec): Update call to map_init. From-SVN: r264397	2018-09-18 08:41:54 -07:00
Cesar Philippidis	bd9b3d3d1a	[nvptx] Use CUDA driver API to select default runtime launch geometry The CUDA driver API starting version 6.5 offers a set of runtime functions to calculate several occupancy-related measures, as a replacement for the occupancy calculator spreadsheet. This patch adds a heuristic for default runtime launch geometry, based on the new runtime function cuOccupancyMaxPotentialBlockSize. Build on x86_64 with nvptx accelerator and ran libgomp testsuite. 2018-08-13 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> PR target/85590 * plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef. (cuOccupancyMaxPotentialBlockSize): Declare. * plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define CUoccupancyB2DSize and declare cuOccupancyMaxPotentialBlockSize. (nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the default num_gangs and num_workers when the driver supports it. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r263505	2018-08-13 12:04:24 +00:00
Tom de Vries	8e09a12f01	[libgomp, nvptx] Fall back to cuLinkAddData/cuLinkCreate if _v2 not found Cuda driver api functions cuLinkAddData and cuLinkCreate are available starting version 5.5. In version 6.5, they are remapped onto _v2 versions. The dlopen interface of the libgomp nvptx plugin uses the _v2 versions, so it won't work with a cuda driver with driver api version lower than 6.5. This patch fixes the problem by testing for the presence of the _v2 versions, and falling back to the original versions in case of absence of the _v2 versions. Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and without --without-cuda-driver. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def (cuLinkAddData_v2, cuLinkCreate_v2): Declare using CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (cuLinkAddData, cuLinkCreate): Undef and declare. (cuLinkAddData_v2, cuLinkCreate_v2): Declare. (link_ptx): Fall back to cuLinkAddData/cuLinkCreate if the _v2 versions are not found. From-SVN: r263408	2018-08-08 14:26:37 +00:00
Tom de Vries	cedd9bd016	[libgomp, nvptx] Allow cuGetErrorString to be NULL Cuda driver api function cuGetErrorString is available in version 6.0 and higher. Currently, when the driver that is used does not contain this function, the libgomp nvptx plugin will not build (PLUGIN_NVPTX_DYNAMIC == 0) or run (PLUGIN_NVPTX_DYNAMIC == 1). This patch fixes this problem by testing for the presence of the function, and handling absence. Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and without --without-cuda-driver. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def (cuGetErrorString): Use CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (cuda_error): Handle if cuGetErrorString is not present. From-SVN: r263407	2018-08-08 14:26:28 +00:00
Tom de Vries	b113af959c	[libgomp, nvptx] Remove hard-coded const in nvptx_open_device CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR is defined in cuda driver api version 6.0 and higher. Currently nvptx_open_device uses a hard-coded constant instead. This patch fixes that by: - defining CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR to the hardcoded constant at toplevel, if not present in cuda.h, and - using CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR in nvptx_open_device Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR): Define. (nvptx_open_device): Use CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR. From-SVN: r263406	2018-08-08 14:26:19 +00:00
Tom de Vries	94767dacea	[libgomp, nvptx] Note that cuGetErrorString is in CUDA_VERSION >= 6000 Cuda driver api function cuGetErrorString is available in version 6.0 and higher. This patch: - removes a comment saying the declaration is not available in cuda.h 6.0 - fixes the presence test to use CUDA_VERSION < 6000 - moves the declaration to toplevel Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (cuda_error): Move declaration of cuGetErrorString ... (cuGetErrorString): ... here. Guard with CUDA_VERSION < 6000. From-SVN: r263405	2018-08-08 14:26:10 +00:00
Tom de Vries	02150de863	[libgomp, nvptx] Handle CUDA_ONE_CALL_MAYBE_NULL This patch adds handling of functions that may not be present in the cuda driver. Such a function can be declared using CUDA_ONE_CALL_MAYBE_NULL in cuda-lib.def, it can be called with the usual convenience macros, but before calling its presence needs to be tested using new macro CUDA_CALL_EXISTS. When using the dlopen interface (PLUGIN_NVPTX_DYNAMIC == 1), we allow non-present functions by allowing dlsym to return NULL. Otherwise (PLUGIN_NVPTX_DYNAMIC == 0) we declare the non-present function to be weak. Build and reg-tested libgomp on x86_64 with nvidia accelerator, with and without --disable-cuda-driver, in combination with a trigger patch that adds a non-existing function foo to cuda-lib.def: ... CUDA_ONE_CALL_MAYBE_NULL (foo) ... and declares it in plugin-nvptx.c: ... CUresult foo (void); ... and then uses it in nvptx_init after the init_cuda_lib call: ... if (CUDA_CALL_EXISTS (foo)) CUDA_CALL (foo); ... Also build and reg-tested on x86_64 with nvidia accelerator, with and without --disable-cuda-driver, in combination with a trigger patch that replaces all CUDA_ONE_CALLs in cuda-lib.def with CUDA_ONE_CALL_MAYBE_NULL, and guards two CUDA_CALLs with CUDA_CALL_EXISTS, one for a regular fn, and one for a fn that is a define in cuda/cuda.h. 2018-08-07 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (DO_PRAGMA): Define. (struct cuda_lib_s): Add def/undef of CUDA_ONE_CALL_MAYBE_NULL. (init_cuda_lib): Add new param to CUDA_ONE_CALL_1. Add arg to corresponding call in CUDA_ONE_CALL. Add def/undef of CUDA_ONE_CALL_MAYBE_NULL. (CUDA_CALL_EXISTS): Define. From-SVN: r263346	2018-08-06 22:13:56 +00:00
Tom de Vries	9e28b10779	[libgomp, nvptx] Minimize lifetime of CUDA_ONE_CALL defines This patch makes sure that the lifetimes of the CUDA_ONE_CALL macro (which is defined twice in plugin-nvptx.c) are minimized, to make it obvious that the definitions are used only in the lib-cuda.def include. Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-07 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (struct cuda_lib_s, init_cuda_lib): Put CUDA_ONE_CALL defines right before the cuda-lib.def include, and the corresponding undefs right after. From-SVN: r263345	2018-08-06 22:13:46 +00:00
Tom de Vries	099400909e	[libgomp, nvptx, --without-cuda-driver] Don't use system cuda driver Using libgomp configure option --with-cuda-driver=<dir> we can indicate what cuda driver to use to build the libgomp nvptx plugin. Without such an option, the system cuda driver is used, if available. If not availabe, a dlopen interface is used instead. However, when we use --without-cuda-driver (or the equivalent --with-cuda-driver=no) the system cuda driver is still used if available. This patch fixes that, making sure that --without-cuda-driver selects the dlopen interface. Build on x86_64 with nvptx accelerator and tested libgomp testsuite, with and without option --without-cuda-driver. 2018-08-04 Tom de Vries <tdevries@suse.de> * plugin/configfrag.ac: For --without-cuda-driver, set CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB to no. Handle CUDA_DRIVER_INCLUDE == no and CUDA_DRIVER_LIB == no. * configure: Regenerate. From-SVN: r263310	2018-08-04 20:07:22 +00:00
Cesar Philippidis	094db6beb9	[PATCH] Remove use of 'struct map' from plugin (nvptx) libgomp/ * plugin/plugin-nvptx.c (struct map): Removed. (map_init, map_pop): Remove use of struct map. (map_push): Likewise and change argument list. * testsuite/libgomp.oacc-c-c++-common/mapping-1.c: New Co-Authored-By: James Norris <jnorris@codesourcery.com> From-SVN: r263212	2018-08-01 07:09:56 -07:00
Tom de Vries	8c6310a2c2	[libgomp, nvptx] Add cuda-lib.def 2018-08-01 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def: New file. Factor out of ... * plugin/plugin-nvptx.c (CUDA_CALLS): ... here. (struct cuda_lib_s, init_cuda_lib): Include cuda-lib.def instead of using CUDA_CALLS. From-SVN: r263208	2018-08-01 13:20:22 +00:00
Tom de Vries	4cdfee3f20	[libgomp, nvptx] Handle per-function max-threads-per-block in default dims Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the kernel launch configurations exceeds the resources available in the device, due to the default dimensions chosen by the runtime. This patch fixes that by taking the per-function max_threads_per_block into account when using the default dimensions. 2018-07-30 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (MIN, MAX): Redefine. (nvptx_exec): Ensure worker and vector default dims don't exceed targ_fn->max_threads_per_block. From-SVN: r263062	2018-07-30 08:17:26 +00:00
Tom de Vries	0b210c43bb	[libgomp, nvptx] Calculate default dims per device The default dimensions are calculated using per-device properties, but initialized once and used on all devices. This patch fixes this problem by introducing per-device default dimensions. 2018-07-30 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (struct ptx_device): Add default_dims field. (nvptx_open_device): Init default_dims for device. (nvptx_exec): Use default_dims from device. From-SVN: r263061	2018-07-30 08:17:16 +00:00
Cesar Philippidis	88a4654d03	[libgomp, nvptx] Add error with recompilation hint for launch failure Currently, when a kernel is lauched with too many workers, it results in a cuda launch failure. This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro M1200. This patch detects this situation, and errors out with a hint on how to fix it. Build and reg-tested on x86_64 with nvptx accelerator. 2018-07-26 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have sufficient resources to launch a kernel, and give a hint on how to fix it. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r262997	2018-07-26 11:42:29 +00:00
Cesar Philippidis	0c6c2f5fc2	[libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open Move sampling of device properties from nvptx_exec to nvptx_open, and assume the sampling always succeeds. This simplifies the default dimension initialization code in nvptx_open. 2018-07-26 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (struct ptx_device): Add warp_size, max_threads_per_block and max_threads_per_multiprocessor fields. (nvptx_open_device): Initialize new fields. (nvptx_exec): Use num_sms, and new fields. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r262996	2018-07-26 11:42:19 +00:00
Tom de Vries	ec00d3faf4	[openacc] Move GOMP_OPENACC_DIM parsing out of nvptx plugin 2018-05-02 Tom de Vries <tom@codesourcery.com> PR libgomp/85411 * plugin/plugin-nvptx.c (nvptx_exec): Move parsing of GOMP_OPENACC_DIM ... * env.c (parse_gomp_openacc_dim): ... here. New function. (initialize_env): Call parse_gomp_openacc_dim. (goacc_default_dims): Define. * libgomp.h (goacc_default_dims): Declare. * oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): New function. * oacc-plugin.h (GOMP_PLUGIN_acc_default_dim): Declare. * libgomp.map: New version "GOMP_PLUGIN_1.2". Add GOMP_PLUGIN_acc_default_dim. * testsuite/libgomp.oacc-c-c++-common/loop-default-runtime.c: New test. * testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test. From-SVN: r259852	2018-05-02 17:53:56 +00:00
Tom de Vries	df36a3d3be	[nvptx, libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin 2018-04-26 Tom de Vries <tom@codesourcery.com> PR libgomp/84020 * plugin/cuda/cuda.h (CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL. * plugin/plugin-nvptx.c (_GNU_SOURCE): Define. (process_GOMP_NVPTX_JIT): New function. (link_ptx): Use process_GOMP_NVPTX_JIT. From-SVN: r259678	2018-04-26 13:27:04 +00:00
Jakub Jelinek	85ec4feb11	Update copyright years. From-SVN: r256169	2018-01-03 11:03:58 +01:00
Tom de Vries	12e9c8ce6c	Remove semicolon after do {} while (false) in HSA_LOG 2017-10-31 Tom de Vries <tom@codesourcery.com> * plugin/plugin-hsa.c (HSA_LOG): Remove semicolon after "do {} while (false)". (init_single_kernel, GOMP_OFFLOAD_async_run): Add missing semicolon after HSA_DEBUG call. From-SVN: r254264	2017-10-31 12:56:41 +00:00
Tom de Vries	9607b014b2	Fix secure_getenv.h include in plugin-hsa.c 2017-07-03 Tom de Vries <tom@codesourcery.com> * plugin/plugin-hsa.c: Fix secure_getenv.h include. From-SVN: r249918	2017-07-03 13:40:19 +00:00
Tom de Vries	dfb15f6bbb	Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin 2017-06-27 Tom de Vries <tom@codesourcery.com> * plugin/plugin-nvptx.c (notify_var): New function. (nvptx_exec): Use notify_var for GOMP_OPENACC_DIM. From-SVN: r249695	2017-06-27 15:51:48 +00:00
Tom de Vries	22f1a03704	Use secure_getenv for GOMP_DEBUG 2017-06-27 Tom de Vries <tom@codesourcery.com> * env.c (parse_unsigned_long_1): Factor out of ... (parse_unsigned_long): ... here. (parse_int_1): Factor out of ... (parse_int): ... here. (parse_int_secure): New function. (initialize_env): Use parse_int_secure for GOMP_DEBUG. * secure_getenv.h: Factor out of ... * plugin/plugin-hsa.c: ... here. * testsuite/libgomp.oacc-c-c++-common/gomp-debug-env.c: New test. From-SVN: r249694	2017-06-27 15:51:37 +00:00
Thomas Schwinge	78672bd8fd	libgomp nvptx plugin: Debugging output when disabling nvptx offloading libgomp/ * plugin/plugin-nvptx.c (nvptx_get_num_devices): Debugging output when disabling nvptx offloading. From-SVN: r248400	2017-05-24 08:59:05 +02:00
Thomas Schwinge	0da2f96af0	libgomp hsa plugin: debug output for HSA runtime library loading failure libgomp/ * plugin/plugin-hsa.c (DLSYM_FN, init_hsa_runtime_functions): Debug output for failure. From-SVN: r248277	2017-05-19 15:32:04 +02:00
Jakub Jelinek	19929ba9c9	plugin-nvptx.c (cuda_lib_inited): Use signed char type instead of char. * plugin/plugin-nvptx.c (cuda_lib_inited): Use signed char type instead of char. From-SVN: r246918	2017-04-13 21:59:04 +02:00
Thomas Schwinge	e70ab10d5c	libgomp, nvptx plugin: Make "nvptx_exec" static libgomp/ * plugin/plugin-nvptx.c (nvptx_exec): Make it static. From-SVN: r245127	2017-02-02 15:35:30 +01:00
Thomas Schwinge	345a8c1712	libgomp: Normalize the names of a few functions of the libgomp plugin API libgomp/ * libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to GOMP_OFFLOAD_openacc_exec. Adjust all users. (GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to GOMP_OFFLOAD_openacc_cuda_get_current_device. Adjust all users. (GOMP_OFFLOAD_openacc_get_current_cuda_context): Rename to GOMP_OFFLOAD_openacc_cuda_get_current_context. Adjust all users. (GOMP_OFFLOAD_openacc_get_cuda_stream): Rename to GOMP_OFFLOAD_openacc_cuda_get_stream. Adjust all users. (GOMP_OFFLOAD_openacc_set_cuda_stream): Rename to GOMP_OFFLOAD_openacc_cuda_set_stream. Adjust all users. From-SVN: r245125	2017-02-02 15:13:57 +01:00
Thomas Schwinge	dced339c8a	libgomp: Provide prototypes for functions implemented by libgomp plugins libgomp/ * libgomp-plugin.h: #include <stdbool.h>. (GOMP_OFFLOAD_get_name, GOMP_OFFLOAD_get_caps) (GOMP_OFFLOAD_get_type, GOMP_OFFLOAD_get_num_devices) (GOMP_OFFLOAD_init_device, GOMP_OFFLOAD_fini_device) (GOMP_OFFLOAD_version, GOMP_OFFLOAD_load_image) (GOMP_OFFLOAD_unload_image, GOMP_OFFLOAD_alloc, GOMP_OFFLOAD_free) (GOMP_OFFLOAD_dev2host, GOMP_OFFLOAD_host2dev) (GOMP_OFFLOAD_dev2dev, GOMP_OFFLOAD_can_run, GOMP_OFFLOAD_run) (GOMP_OFFLOAD_async_run, GOMP_OFFLOAD_openacc_parallel) (GOMP_OFFLOAD_openacc_register_async_cleanup) (GOMP_OFFLOAD_openacc_async_test) (GOMP_OFFLOAD_openacc_async_test_all) (GOMP_OFFLOAD_openacc_async_wait) (GOMP_OFFLOAD_openacc_async_wait_async) (GOMP_OFFLOAD_openacc_async_wait_all) (GOMP_OFFLOAD_openacc_async_wait_all_async) (GOMP_OFFLOAD_openacc_async_set_async) (GOMP_OFFLOAD_openacc_create_thread_data) (GOMP_OFFLOAD_openacc_destroy_thread_data) (GOMP_OFFLOAD_openacc_get_current_cuda_device) (GOMP_OFFLOAD_openacc_get_current_cuda_context) (GOMP_OFFLOAD_openacc_get_cuda_stream) (GOMP_OFFLOAD_openacc_set_cuda_stream): New prototypes. * libgomp.h (struct acc_dispatch_t, struct gomp_device_descr): Use these. * plugin/plugin-hsa.c (GOMP_OFFLOAD_load_image) (GOMP_OFFLOAD_unload_image): Fix argument types. liboffloadmic/ * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_type): Fix return type. (GOMP_OFFLOAD_load_image): Fix argument types. From-SVN: r245062	2017-01-31 15:32:58 +01:00
Pekka Jääskeläinen	5fd1486ce5	Brig front-end 2017-01-24 Pekka Jääskeläinen <pekka@parmance.com> Martin Jambor <mjambor@suse.cz> * Makefile.def (target_modules): Added libhsail-rt. (languages): Added language brig. * Makefile.in: Regenerated. * configure.ac (TOPLEVEL_CONFIGURE_ARGUMENTS): Added tgarget-libhsail-rt. Make brig unsupported on untested architectures. * configure: Regenerated. gcc/ * brig-builtins.def: New file. * builtins.def (DEF_HSAIL_BUILTIN): New macro. (DEF_HSAIL_ATOMIC_BUILTIN): Likewise. (DEF_HSAIL_SAT_BUILTIN): Likewise. (DEF_HSAIL_INTR_BUILTIN): Likewise. (DEF_HSAIL_CVT_ZEROI_SAT_BUILTIN): Likewise. * builtin-types.def (BT_INT8): New. (BT_INT16): Likewise. (BT_UINT8): Likewise. (BT_UINT16): Likewise. (BT_FN_ULONG): Likewise. (BT_FN_UINT_INT): Likewise. (BT_FN_UINT_ULONG): Likewise. (BT_FN_UINT_LONG): Likewise. (BT_FN_UINT_PTR): Likewise. (BT_FN_ULONG_PTR): Likewise. (BT_FN_INT8_FLOAT): Likewise. (BT_FN_INT16_FLOAT): Likewise. (BT_FN_UINT32_FLOAT): Likewise. (BT_FN_UINT16_FLOAT): Likewise. (BT_FN_UINT8_FLOAT): Likewise. (BT_FN_UINT64_FLOAT): Likewise. (BT_FN_UINT16_UINT32): Likewise. (BT_FN_UINT32_UINT16): Likewise. (BT_FN_UINT16_UINT16_UINT16): Likewise. (BT_FN_INT_PTR_INT): Likewise. (BT_FN_UINT_PTR_UINT): Likewise. (BT_FN_LONG_PTR_LONG): Likewise. (BT_FN_ULONG_PTR_ULONG): Likewise. (BT_FN_VOID_UINT64_UINT64): Likewise. (BT_FN_UINT8_UINT8_UINT8): Likewise. (BT_FN_INT8_INT8_INT8): Likewise. (BT_FN_INT16_INT16_INT16): Likewise. (BT_FN_INT_INT_INT): Likewise. (BT_FN_UINT_FLOAT_UINT): Likewise. (BT_FN_FLOAT_UINT_UINT): Likewise. (BT_FN_ULONG_UINT_UINT): Likewise. (BT_FN_ULONG_UINT_PTR): Likewise. (BT_FN_ULONG_ULONG_ULONG): Likewise. (BT_FN_UINT_UINT_UINT): Likewise. (BT_FN_VOID_UINT_PTR): Likewise. (BT_FN_UINT_UINT_PTR: Likewise. (BT_FN_UINT32_UINT64_PTR): Likewise. (BT_FN_INT_INT_UINT_UINT): Likewise. (BT_FN_UINT_UINT_UINT_UINT): Likewise. (BT_FN_UINT_UINT_UINT_PTR): Likewise. (BT_FN_UINT_ULONG_ULONG_UINT): Likewise. (BT_FN_ULONG_ULONG_ULONG_ULONG): Likewise. (BT_FN_LONG_LONG_UINT_UINT): Likewise. (BT_FN_ULONG_ULONG_UINT_UINT): Likewise. (BT_FN_VOID_UINT32_UINT64_PTR): Likewise. (BT_FN_VOID_UINT32_UINT32_PTR): Likewise. (BT_FN_UINT_UINT_UINT_UINT_UINT): Likewise. (BT_FN_UINT_FLOAT_FLOAT_FLOAT_FLOAT): Likewise. (BT_FN_ULONG_ULONG_ULONG_UINT_UINT): Likewise. * doc/frontends.texi: List BRIG FE. * doc/install.texi (Testing): Add BRIG tesring requirements. * doc/invoke.texi (Overall Options): Mention BRIG. * doc/standards.texi (Standards): Doucment BRIG HSA version. gcc/brig/ * Make-lang.in: New file. * brig-builtins.h: Likewise. * brig-c.h: Likewise. * brig-lang.c: Likewise. * brigspec.c: Likewise. * config-lang.in: Likewise. * lang-specs.h: Likewise. * lang.opt: Likewise. * brigfrontend/brig-arg-block-handler.cc: Likewise. * brigfrontend/brig-atomic-inst-handler.cc: Likewise. * brigfrontend/brig-basic-inst-handler.cc: Likewise. * brigfrontend/brig-branch-inst-handler.cc: Likewise. * brigfrontend/brig-cmp-inst-handler.cc: Likewise. * brigfrontend/brig-code-entry-handler.cc: Likewise. * brigfrontend/brig-code-entry-handler.h: Likewise. * brigfrontend/brig-comment-handler.cc: Likewise. * brigfrontend/brig-control-handler.cc: Likewise. * brigfrontend/brig-copy-move-inst-handler.cc: Likewise. * brigfrontend/brig-cvt-inst-handler.cc: Likewise. * brigfrontend/brig-fbarrier-handler.cc: Likewise. * brigfrontend/brig-function-handler.cc: Likewise. * brigfrontend/brig-function.cc: Likewise. * brigfrontend/brig-function.h: Likewise. * brigfrontend/brig-inst-mod-handler.cc: Likewise. * brigfrontend/brig-label-handler.cc: Likewise. * brigfrontend/brig-lane-inst-handler.cc: Likewise. * brigfrontend/brig-machine.c: Likewise. * brigfrontend/brig-machine.h: Likewise. * brigfrontend/brig-mem-inst-handler.cc: Likewise. * brigfrontend/brig-module-handler.cc: Likewise. * brigfrontend/brig-queue-inst-handler.cc: Likewise. * brigfrontend/brig-seg-inst-handler.cc: Likewise. * brigfrontend/brig-signal-inst-handler.cc: Likewise. * brigfrontend/brig-to-generic.cc: Likewise. * brigfrontend/brig-to-generic.h: Likewise. * brigfrontend/brig-util.cc: Likewise. * brigfrontend/brig-util.h: Likewise. * brigfrontend/brig-variable-handler.cc: Likewise. * brigfrontend/phsa.h: Likewise. gcc/testsuite/ * lib/brig-dg.exp: New file. * lib/brig.exp: Likewise. * brig.dg/README: Likewise. * brig.dg/dg.exp: Likewise. * brig.dg/test/gimple/alloca.hsail: Likewise. * brig.dg/test/gimple/atomics.hsail: Likewise. * brig.dg/test/gimple/branches.hsail: Likewise. * brig.dg/test/gimple/fbarrier.hsail: Likewise. * brig.dg/test/gimple/function_calls.hsail: Likewise. * brig.dg/test/gimple/kernarg.hsail: Likewise. * brig.dg/test/gimple/mem.hsail: Likewise. * brig.dg/test/gimple/mulhi.hsail: Likewise. * brig.dg/test/gimple/packed.hsail: Likewise. * brig.dg/test/gimple/smoke_test.hsail: Likewise. * brig.dg/test/gimple/variables.hsail: Likewise. * brig.dg/test/gimple/vector.hsail: Likewise. include/ * hsa.h: Moved here from libgomp/plugin/hsa.h. libgomp/ * plugin/hsa.h: Moved to top level include. * plugin/plugin-hsa.c: Chanfgd include of hsa.h accordingly. libhsail-rt/ * Makefile.am: New file. * target-config.h.in: Likewise. * configure.ac: Likewise. * configure: Likewise. * config.h.in: Likewise. * aclocal.m4: Likewise. * README: Likewise. * Makefile.in: Likewise. * include/internal/fibers.h: Likewise. * include/internal/phsa-queue-interface.h: Likewise. * include/internal/phsa-rt.h: Likewise. * include/internal/workitems.h: Likewise. * rt/arithmetic.c: Likewise. * rt/atomics.c: Likewise. * rt/bitstring.c: Likewise. * rt/fbarrier.c: Likewise. * rt/fibers.c: Likewise. * rt/fp16.c: Likewise. * rt/misc.c: Likewise. * rt/multimedia.c: Likewise. * rt/queue.c: Likewise. * rt/sat_arithmetic.c: Likewise. * rt/segment.c: Likewise. * rt/workitems.c: Likewise. Co-Authored-By: Martin Jambor <mjambor@suse.cz> From-SVN: r244867	2017-01-24 13:45:56 +01:00
Jakub Jelinek	b32e85fa42	cuda.h (CUdeviceptr): Typedef to unsigned long long even for _WIN64. * plugin/cuda/cuda.h (CUdeviceptr): Typedef to unsigned long long even for _WIN64. From-SVN: r244638	2017-01-19 16:53:51 +01:00
Jakub Jelinek	d190d5c093	hsa.h: Add GCC runtime library exception. * plugin/hsa.h: Add GCC runtime library exception. * plugin/hsa_ext_finalize.h: Likewise. From-SVN: r244523	2017-01-17 10:45:23 +01:00
Jakub Jelinek	2393d337e7	configfrag.ac: For --without-cuda-driver don't initialize CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB. * plugin/configfrag.ac: For --without-cuda-driver don't initialize CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB. If both CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB are empty and linking small cuda program fails, define PLUGIN_NVPTX_DYNAMIC to 1 and use plugin/include/cuda as include dir and -ldl instead of -lcuda as library to link ptx plugin against. * plugin/plugin-nvptx.c: Include dlfcn.h if PLUGIN_NVPTX_DYNAMIC. (CUDA_CALLS): Define. (cuda_lib, cuda_lib_inited): New variables. (init_cuda_lib): New function. (CUDA_CALL_PREFIX): Define. (CUDA_CALL_ERET, CUDA_CALL_ASSERT): Use CUDA_CALL_PREFIX. (CUDA_CALL): Use FN instead of (FN). (CUDA_CALL_NOCHECK): Define. (cuda_error, fini_streams_for_device, select_stream_for_async, nvptx_attach_host_thread_to_device, nvptx_open_device, link_ptx, event_gc, nvptx_exec, nvptx_async_test, nvptx_async_test_all, nvptx_wait_all, nvptx_set_clocktick, GOMP_OFFLOAD_unload_image, nvptx_stacks_alloc, nvptx_stacks_free, GOMP_OFFLOAD_run): Use CUDA_CALL_NOCHECK. (nvptx_init): Call init_cuda_lib, if it fails, return false. Use CUDA_CALL_NOCHECK. (nvptx_get_num_devices): Call init_cuda_lib, if it fails, return 0. Use CUDA_CALL_NOCHECK. * plugin/cuda/cuda.h: New file. * config.h.in: Regenerated. * configure: Regenerated. From-SVN: r244522	2017-01-17 10:44:17 +01:00
Jakub Jelinek	cbe34bb5ed	Update copyright years. From-SVN: r243994	2017-01-01 13:07:43 +01:00

1 2

84 Commits