OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Maciej W. Rozycki	6c84c8bf9b	Add OpenACC 2.6 `acc_get_property' support Add generic support for the OpenACC 2.6 `acc_get_property' and `acc_get_property_string' routines, as well as full handlers for the host and the NVPTX offload targets and minimal handlers for the HSA, Intel MIC, and AMD GCN offload targets. Included are C/C++ and Fortran tests that, in particular, print the property values for acc_property_vendor, acc_property_memory, acc_property_free_memory, acc_property_name, and acc_property_driver. The output looks as follows: Vendor: GNU Name: GOMP Total memory: 0 Free memory: 0 Driver: 1.0 with the host driver (where the memory related properties are not supported for the host device and yield 0, conforming to the standard) and output like: Vendor: Nvidia Total memory: 12651462656 Free memory: 12202737664 Name: TITAN V Driver: CUDA Driver 9.1 with the NVPTX driver. 2019-12-22 Maciej W. Rozycki <macro@codesourcery.com> Frederik Harwath <frederik@codesourcery.com> Thomas Schwinge <tschwinge@codesourcery.com> include/ * gomp-constants.h (gomp_device_property): New enum. libgomp/ * libgomp.h (gomp_device_descr): Add `get_property_func' member. * libgomp-plugin.h (gomp_device_property_value): New union. (gomp_device_property_value): New prototype. * openacc.h (acc_device_t): Add `acc_device_current' enumeration constant. (acc_device_property_t): New enum. (acc_get_property, acc_get_property_string): New prototypes. * oacc-init.c (acc_get_device_type): Also assert that result is not `acc_device_current'. (get_property_any, acc_get_property, acc_get_property_string): New functions. * openacc.f90 (openacc_kinds): Add `acc_device_current' and `acc_property_memory', `acc_property_free_memory', `acc_property_name', `acc_property_vendor' and `acc_property_driver' constants. Add `acc_device_property' data type. (openacc_internal): Add `acc_get_property' and `acc_get_property_string' interfaces. Add `acc_get_property_h', `acc_get_property_string_h', `acc_get_property_l' and `acc_get_property_string_l'. * oacc-host.c (host_get_property): New function. (host_dispatch): Wire it. * target.c (gomp_load_plugin_for_device): Handle `get_property'. * libgomp.map (OACC_2.6): Add `acc_get_property', `acc_get_property_h_', `acc_get_property_string' and `acc_get_property_string_h_' symbols. * libgomp.texi (OpenACC Runtime Library Routines): Add `acc_get_property'. (acc_get_property): New node. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_property): New function (stub). * plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): New function. * plugin/plugin-nvptx.c (CUDA_CALLS): Add `cuDeviceGetName', `cuDeviceTotalMem', `cuDriverGetVersion' and `cuMemGetInfo' calls. (GOMP_OFFLOAD_get_property): New function. (struct ptx_device): Add new field "name". (cuda_driver_version_s): Add new static variable ... (nvptx_init): ... and init from here. * testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c: New file with test helper functions. * testsuite/libgomp.oacc-fortran/acc_get_property.f90: New test. liboffloadmic/ * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property): New function. Reviewed-by: Thomas Schwinge <thomas@codesourcery.com> Co-Authored-By: Frederik Harwath <frederik@codesourcery.com> Co-Authored-By: Thomas Schwinge <tschwinge@codesourcery.com> From-SVN: r279710	2019-12-22 19:54:09 +00:00
Andrew Stubbs	d2903ce05b	Add device number to GOMP_OFFLOAD_openacc_async_construct 2019-11-13 Andrew Stubbs <ams@codesourcery.com> Julian Brown <julian@codesourcery.com> libgomp/ * libgomp-plugin.h (GOMP_OFFLOAD_openacc_async_construct): Add int parameter. * oacc-async.c (lookup_goacc_asyncqueue): Pass device number to the queue constructor. * oacc-host.c (host_openacc_async_construct): Add device parameter. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_openacc_async_construct): Add device parameter. Co-Authored-By: Julian Brown <julian@codesourcery.com> From-SVN: r278134	2019-11-13 12:37:59 +00:00
Jakub Jelinek	810f316dd6	configure.ac: Remove GCC_HEADER_STDINT(gstdint.h). * configure.ac: Remove GCC_HEADER_STDINT(gstdint.h). * libgomp.h: Include <stdint.h> instead of "gstdint.h". * oacc-parallel.c: Don't include "libgomp_g.h". * plugin/plugin-hsa.c: Include <stdint.h> instead of "gstdint.h". * plugin/plugin-nvptx.c: Don't include "gstdint.h". * aclocal.m4: Regenerated. * config.h.in: Regenerated. * configure: Regenerated. * Makefile.in: Regenerated. From-SVN: r276389	2019-10-01 09:51:46 +02:00
Jakub Jelinek	b5c26449f3	re PR libgomp/90585 (libgomp hsa plugin ftbfs in the x32 multilib variant) PR libgomp/90585 * plugin/plugin-hsa.c: Include gstdint.h. Include inttypes.h only if HAVE_INTTYPES_H is defined. (print_uint64_t): New typedef. (PRIu64): Define if HAVE_INTTYPES_H is not defined. (print_kernel_dispatch, run_kernel): Use PRIu64 macro instead of "lu", cast uint64_t HSA_DEBUG and fprintf arguments to print_uint64_t. (release_kernel_dispatch): Likewise. Cast shadow->debug to uintptr_t before casting to void . plugin/plugin-nvptx.c: Include gstdint.h instead of stdint.h. * oacc-mem.c: Don't include config.h nor stdint.h. * target.c: Don't include config.h. * oacc-cuda.c: Likewise. * oacc-host.c: Don't include stdint.h. From-SVN: r271597	2019-05-24 10:59:37 +02:00
Thomas Schwinge	5fae049dc2	OpenACC Profiling Interface (incomplete) libgomp/ * acc_prof.h: New file. * oacc-profiling.c: Likewise. * Makefile.am (nodist_libsubinclude_HEADERS, libgomp_la_SOURCES): Add these, respectively. * Makefile.in: Regenerate. * env.c (initialize_env): Call goacc_profiling_initialize. * oacc-plugin.c (GOMP_PLUGIN_goacc_thread) (GOMP_PLUGIN_goacc_profiling_dispatch): New functions. * oacc-plugin.h (GOMP_PLUGIN_goacc_thread) (GOMP_PLUGIN_goacc_profiling_dispatch): Declare. * libgomp.map (OACC_2.5.1): Add acc_prof_lookup, acc_prof_register, acc_prof_unregister, and acc_register_library. (GOMP_PLUGIN_1.3): Add GOMP_PLUGIN_goacc_profiling_dispatch, and GOMP_PLUGIN_goacc_thread. * oacc-int.h (struct goacc_thread): Add prof_info, api_info, prof_callbacks_enabled members. (goacc_prof_enabled, goacc_profiling_initialize) (_goacc_profiling_dispatch_p, _goacc_profiling_setup_p) (goacc_profiling_dispatch): Declare. (GOACC_PROF_ENABLED, GOACC_PROFILING_DISPATCH_P) (GOACC_PROFILING_SETUP_P): Define. * oacc-async.c (acc_async_test, acc_async_test_all, acc_wait) (acc_wait_async, acc_wait_all, acc_wait_all_async): Update for OpenACC Profiling Interface. * oacc-cuda.c (acc_get_current_cuda_device) (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Likewise. * oacc-init.c (acc_init_1, goacc_attach_host_thread_to_device) (acc_init, acc_set_device_type, acc_get_device_type) (acc_get_device_num, goacc_lazy_initialize): Likewise. * oacc-mem.c (acc_malloc, acc_free, memcpy_tofrom_device) (acc_deviceptr, acc_hostptr, acc_is_present, acc_map_data) (acc_unmap_data, present_create_copy, delete_copyout) (update_dev_host): Likewise. * oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start) (GOACC_data_end, GOACC_enter_exit_data, GOACC_update, GOACC_wait): Likewise. * plugin/plugin-nvptx.c (nvptx_exec, nvptx_alloc, nvptx_free) (GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec): Likewise. * libgomp.texi: Update. * testsuite/libgomp.oacc-c-c++-common/acc_prof-dispatch-1.c: New file. * testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-valid_bytes-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-version-1.c: Likewise. From-SVN: r271346	2019-05-17 21:13:36 +02:00
Chung-Lin Tang	1f4c5b9bb2	2019-05-13 Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Thomas Schwinge <thomas@codesourcery.com> libgomp/ * libgomp-plugin.h (struct goacc_asyncqueue): Declare. (struct goacc_asyncqueue_list): Likewise. (goacc_aq): Likewise. (goacc_aq_list): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove. (GOMP_OFFLOAD_openacc_async_test): Remove. (GOMP_OFFLOAD_openacc_async_test_all): Remove. (GOMP_OFFLOAD_openacc_async_wait): Remove. (GOMP_OFFLOAD_openacc_async_wait_async): Remove. (GOMP_OFFLOAD_openacc_async_wait_all): Remove. (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove. (GOMP_OFFLOAD_openacc_async_set_async): Remove. (GOMP_OFFLOAD_openacc_exec): Adjust declaration. (GOMP_OFFLOAD_openacc_cuda_get_stream): Likewise. (GOMP_OFFLOAD_openacc_cuda_set_stream): Likewise. (GOMP_OFFLOAD_openacc_async_exec): Declare. (GOMP_OFFLOAD_openacc_async_construct): Declare. (GOMP_OFFLOAD_openacc_async_destruct): Declare. (GOMP_OFFLOAD_openacc_async_test): Declare. (GOMP_OFFLOAD_openacc_async_synchronize): Declare. (GOMP_OFFLOAD_openacc_async_serialize): Declare. (GOMP_OFFLOAD_openacc_async_queue_callback): Declare. (GOMP_OFFLOAD_openacc_async_host2dev): Declare. (GOMP_OFFLOAD_openacc_async_dev2host): Declare. * libgomp.h (struct acc_dispatch_t): Define 'async' sub-struct. (gomp_acc_insert_pointer): Adjust declaration. (gomp_copy_host2dev): New declaration. (gomp_copy_dev2host): Likewise. (gomp_map_vars_async): Likewise. (gomp_unmap_tgt): Likewise. (gomp_unmap_vars_async): Likewise. (gomp_fini_device): Likewise. * oacc-async.c (get_goacc_thread): New function. (get_goacc_thread_device): New function. (lookup_goacc_asyncqueue): New function. (get_goacc_asyncqueue): New function. (acc_async_test): Adjust code to use new async design. (acc_async_test_all): Likewise. (acc_wait): Likewise. (acc_wait_async): Likewise. (acc_wait_all): Likewise. (acc_wait_all_async): Likewise. (goacc_async_free): New function. (goacc_init_asyncqueues): Likewise. (goacc_fini_asyncqueues): Likewise. * oacc-cuda.c (acc_get_cuda_stream): Adjust code to use new async design. (acc_set_cuda_stream): Likewise. * oacc-host.c (host_openacc_exec): Adjust parameters, remove 'async'. (host_openacc_register_async_cleanup): Remove. (host_openacc_async_exec): New function. (host_openacc_async_test): Adjust parameters. (host_openacc_async_test_all): Remove. (host_openacc_async_wait): Remove. (host_openacc_async_wait_async): Remove. (host_openacc_async_wait_all): Remove. (host_openacc_async_wait_all_async): Remove. (host_openacc_async_set_async): Remove. (host_openacc_async_synchronize): New function. (host_openacc_async_serialize): New function. (host_openacc_async_host2dev): New function. (host_openacc_async_dev2host): New function. (host_openacc_async_queue_callback): New function. (host_openacc_async_construct): New function. (host_openacc_async_destruct): New function. (struct gomp_device_descr host_dispatch): Remove initialization of old interface, add intialization of new async sub-struct. * oacc-init.c (acc_shutdown_1): Adjust to use gomp_fini_device. (goacc_attach_host_thread_to_device): Remove old async code usage. * oacc-int.h (goacc_init_asyncqueues): New declaration. (goacc_fini_asyncqueues): Likewise. (goacc_async_copyout_unmap_vars): Likewise. (goacc_async_free): Likewise. (get_goacc_asyncqueue): Likewise. (lookup_goacc_asyncqueue): Likewise. * oacc-mem.c (memcpy_tofrom_device): Adjust code to use new async design. (present_create_copy): Adjust code to use new async design. (delete_copyout): Likewise. (update_dev_host): Likewise. (gomp_acc_insert_pointer): Add async parameter, adjust code to use new async design. (gomp_acc_remove_pointer): Adjust code to use new async design. * oacc-parallel.c (GOACC_parallel_keyed): Adjust code to use new async design. (GOACC_enter_exit_data): Likewise. (goacc_wait): Likewise. (GOACC_update): Likewise. * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Change to assert fail when called, warn as obsolete in comment. * target.c (goacc_device_copy_async): New function. (gomp_copy_host2dev): Remove 'static', add goacc_asyncqueue parameter, add goacc_device_copy_async case. (gomp_copy_dev2host): Likewise. (gomp_map_vars_existing): Add goacc_asyncqueue parameter, adjust code. (gomp_map_pointer): Likewise. (gomp_map_fields_existing): Likewise. (gomp_map_vars_internal): New always_inline function, renamed from gomp_map_vars. (gomp_map_vars): Implement by calling gomp_map_vars_internal. (gomp_map_vars_async): Implement by calling gomp_map_vars_internal, passing goacc_asyncqueue argument. (gomp_unmap_tgt): Remove static, add attribute_hidden. (gomp_unref_tgt): New function. (gomp_unmap_vars_internal): New always_inline function, renamed from gomp_unmap_vars. (gomp_unmap_vars): Implement by calling gomp_unmap_vars_internal. (gomp_unmap_vars_async): Implement by calling gomp_unmap_vars_internal, passing goacc_asyncqueue argument. (gomp_fini_device): New function. (gomp_exit_data): Adjust gomp_copy_dev2host call. (gomp_load_plugin_for_device): Remove old interface, adjust to load new async interface. (gomp_target_fini): Adjust code to call gomp_fini_device. * plugin/plugin-nvptx.c (struct cuda_map): Remove. (struct ptx_stream): Remove. (struct nvptx_thread): Remove current_stream field. (cuda_map_create): Remove. (cuda_map_destroy): Remove. (map_init): Remove. (map_fini): Remove. (map_pop): Remove. (map_push): Remove. (struct goacc_asyncqueue): Define. (struct nvptx_callback): Define. (struct ptx_free_block): Define. (struct ptx_device): Remove null_stream, active_streams, async_streams, stream_lock, and next fields. (enum ptx_event_type): Remove. (struct ptx_event): Remove. (ptx_event_lock): Remove. (ptx_events): Remove. (init_streams_for_device): Remove. (fini_streams_for_device): Remove. (select_stream_for_async): Remove. (nvptx_init): Remove ptx_events and ptx_event_lock references. (nvptx_attach_host_thread_to_device): Remove CUDA_ERROR_NOT_PERMITTED case. (nvptx_open_device): Add free_blocks initialization, remove init_streams_for_device call. (nvptx_close_device): Remove fini_streams_for_device call, add free_blocks destruct code. (event_gc): Remove. (event_add): Remove. (nvptx_exec): Adjust parameters and code. (nvptx_free): Likewise. (nvptx_host2dev): Remove. (nvptx_dev2host): Remove. (nvptx_set_async): Remove. (nvptx_async_test): Remove. (nvptx_async_test_all): Remove. (nvptx_wait): Remove. (nvptx_wait_async): Remove. (nvptx_wait_all): Remove. (nvptx_wait_all_async): Remove. (nvptx_get_cuda_stream): Remove. (nvptx_set_cuda_stream): Remove. (GOMP_OFFLOAD_alloc): Adjust code. (GOMP_OFFLOAD_free): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove. (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code. (GOMP_OFFLOAD_openacc_async_test_all): Remove. (GOMP_OFFLOAD_openacc_async_wait): Remove. (GOMP_OFFLOAD_openacc_async_wait_async): Remove. (GOMP_OFFLOAD_openacc_async_wait_all): Remove. (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove. (GOMP_OFFLOAD_openacc_async_set_async): Remove. (cuda_free_argmem): New function. (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function. (GOMP_OFFLOAD_openacc_create_thread_data): Adjust code. (GOMP_OFFLOAD_openacc_cuda_get_stream): Adjust code. (GOMP_OFFLOAD_openacc_cuda_set_stream): Adjust code. (GOMP_OFFLOAD_openacc_async_construct): New plugin hook function. (GOMP_OFFLOAD_openacc_async_destruct): New plugin hook function. (GOMP_OFFLOAD_openacc_async_test): Remove and re-implement. (GOMP_OFFLOAD_openacc_async_synchronize): New plugin hook function. (GOMP_OFFLOAD_openacc_async_serialize): New plugin hook function. (GOMP_OFFLOAD_openacc_async_queue_callback): New plugin hook function. (cuda_callback_wrapper): New function. (cuda_memcpy_sanity_check): New function. (GOMP_OFFLOAD_host2dev): Remove and re-implement. (GOMP_OFFLOAD_dev2host): Remove and re-implement. (GOMP_OFFLOAD_openacc_async_host2dev): New plugin hook function. (GOMP_OFFLOAD_openacc_async_dev2host): New plugin hook function. From-SVN: r271128	2019-05-13 13:32:00 +00:00
Tom de Vries	738c56d410	[nvptx, libgomp] Fix memleak in GOMP_OFFLOAD_fini_device I wrote a test-case: ... int main (void) { for (unsigned i = 0; i < 128; ++i) { acc_init (acc_device_nvidia); acc_shutdown (acc_device_nvidia); } return 0; } ... and ran it under valgrind. The only leak location reported with a frequency of 128, was the allocation of ptx_devices in nvptx_init. Fix this by freeing ptx_devices in GOMP_OFFLOAD_fini_device, once instantiated_devices drops to 0. 2019-01-24 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (GOMP_OFFLOAD_fini_device): Free ptx_devices once instantiated_devices drops to 0. From-SVN: r268237	2019-01-24 14:12:19 +00:00
Tom de Vries	4a75460b00	[nvptx, libgomp] Fix cuMemAlloc with size zero Consider test-case: ... int main (void) { #pragma acc parallel async ; #pragma acc parallel async ; #pragma acc wait return 0; } ... This fails with: ... libgomp: cuMemAlloc error: invalid argument Segmentation fault (core dumped) ... The cuMemAlloc error is due to the fact that we're try to allocate 0 bytes. Fix this by preventing calling map_push with size zero argument in nvptx_exec. This also has the consequence that for the abort-1.c test-case, we end up calling cuMemFree during map_fini for the struct cuda_map allocated in map_init, which fails because an abort happened. Fix this by calling cuMemFree with CUDA_CALL_NOCHECK in cuda_map_destroy. 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/PR88946 * plugin/plugin-nvptx.c (cuda_map_destroy): Use CUDA_CALL_NOCHECK for cuMemFree. (nvptx_exec): Don't call map_push if mapnum == 0. * testsuite/libgomp.oacc-c-c++-common/pr88946.c: New test. From-SVN: r268178	2019-01-23 08:16:56 +00:00
Tom de Vries	4fef8e4d8c	[nvptx, libgomp] Fix assert (!s->map->active) in map_fini There are currently two situations where this assert triggers: ... libgomp/plugin/plugin-nvptx.c: map_fini: Assertion `!s->map->active' failed. ... First, in abort-1.c, a parallel region triggering an abort: ... int main (void) { #pragma acc parallel abort (); return 0; } ... The abort is detected in nvptx_exec as the CUDA_ERROR_ILLEGAL_INSTRUCTION return status of the cuStreamSynchronize call after kernel launch, which is then handled by calling non-returning function GOMP_PLUGIN_fatal. Consequently, the map_pop in nvptx_exec that in case of cuStreamSynchronize success would remove or inactive the element added by the map_push earlier in nvptx_exec, does not trigger. With the element no longer active, but still marked active and a member of s->map, we run into the assert during GOMP_OFFLOAD_fini_device, which is triggered from atexit handler gomp_target_fini (which is triggered by the GOMP_PLUGIN_fatal mentioned above calling exit). Second, in pr88941.c, an async parallel region without wait: ... int main (void) { #pragma acc parallel async ; /* no #pragma acc wait / return 0; } ... Because nvptx_exec is handling an async region, it does not call map_pop for the element added by map_push, but schedules an kernel execution completion event to call map_pop. Again, we run into the assert during GOMP_OFFLOAD_fini_device, which is triggered from atexit handler gomp_target_fini, but the exit in this case is triggered by returning from main. So either the kernel is still running, or the kernel has completed but the corresponding event that is supposed to call map_pop is stuck in the event queue, waiting for an event_gc. Fix this by removing the assert, and skipping the freeing of device memory if the map is still marked active (though in the async case, this is more a workaround than an fix). 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/88941 PR target/88939 plugin/plugin-nvptx.c (cuda_map_destroy): Handle map->active case. (map_fini): Remove "assert (!s->map->active)". * testsuite/libgomp.oacc-c-c++-common/pr88941.c: New test. From-SVN: r268177	2019-01-23 08:16:42 +00:00
Tom de Vries	2ee6cb22c1	[nvptx, libgomp] Fix map_push The map field of a struct ptx_stream is a FIFO. The FIFO is implemented as a single linked list, with pop-from-the-front semantics. The function map_pop pops an element, either by: - deallocating the element, if there is more than one element - or marking the element inactive, if there's only one element The responsibility of map_push is to push an element to the back, as well as selecting the element to push, by: - allocating an element, or - reusing the element at the front if inactive and big enough, or - dropping the element at the front if inactive and not big enough, and allocating one that's big enough The current implemention gets at least the first and most basic scenario wrong: > map = cuda_map_create (size); We create an element, and assign it to map. > for (t = s->map; t->next != NULL; t = t->next) > ; We determine the last element in the fifo. > t->next = map; We append the new element. > s->map = map; But here, we throw away the rest of the FIFO, and declare the FIFO to be just the new element. This problem causes the test-case asyncwait-1.c to fail intermittently on some systems. The pr87835.c test-case added here is a a minimized and modified version of asyncwait-1.c (avoiding the kernel construct) that is more likely to fail. Fix this by rewriting map_pop more robustly, by: - seperating the function in two phases: select element, push element - when reusing or dropping an element, making sure that the element is cleanly popped from the queue - rewriting the push element part in such a way that it can handle all cases without needing if statements, such that each line is exercised for each of the three cases. 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/87835 * plugin/plugin-nvptx.c (map_push): Fix adding of allocated element. * testsuite/libgomp.oacc-c-c++-common/pr87835.c: New test. From-SVN: r268176	2019-01-23 08:16:11 +00:00
Tom de Vries	2c2ff1684d	[nvptx] Enable setting vector length using -fopenacc-dim Enable setting vector length using -fopenacc-dim, f.i. -fopenacc-dim=::128. 2019-01-12 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.c (nvptx_goacc_validate_dims_1): Alow setting vector length using -fopenacc-dim. * plugin/plugin-nvptx.c (nvptx_exec): Update error message. From-SVN: r267896	2019-01-12 22:19:15 +00:00
Tom de Vries	52d22ece49	[nvptx] Update insufficient launch message for variable vector_length Update message in nvptx libgomp plugin about insufficient resources to launch kernel, to accommodate for the fact the vector_length can now be variable. 2019-01-12 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Update insufficient hardware resources diagnostic. From-SVN: r267890	2019-01-12 22:18:00 +00:00
Tom de Vries	052aaaceed	[nvptx] Don't allow vector_length 64 with num_workers 16 When using a compiler build with: ... +#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE ... consider a test-case: ... int main (void) { #pragma acc parallel vector_length (64) #pragma acc loop worker for (unsigned int i = 0; i < 32; i++) #pragma acc loop vector for (unsigned int j = 0; j < 64; j++) ; return 0; } ... If num_workers is 16, either because: - we add a "num_workers (16)" clause on the parallel directive, or - we set "GOMP_OPENACC_DIM=:16:", or - the libgomp plugin chooses 16 num_workers we run into an illegal instruction at runtime, because a bar.sync instruction tries to use a barrier 16. The instruction is illegal, because ptx supports only 16 barriers per CTA, and the valid range is 0..15. The problem is that with a warp-multiple vector length, we use a code generation scheme with a per-worker barrier. And because barrier zero is reserved for per-cta barrier, only the remaining 15 barriers can be used as per-worker barrier, and consequently we can't use num_workers larger than 15. This problem occurs only for vector_length 64. For vector_length 32, we use a different code generation scheme, and for vector_length >= 96, the maximum num_workers is not big enough not to trigger this problem. Also, this problem only occurs for num_workers 16. As explained above, num_workers 15 is safe to use, and 16 is already the maximum num_workers for vector_length 64. This patch fixes the problem in both the compiler (handling "num_workers (16)") and in the libgomp nvptx plugin (with and without "GOMP_OPENACC_DIM=:16:"). 2019-01-11 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.c (PTX_CTA_NUM_BARRIERS, PTX_PER_CTA_BARRIER) (PTX_NUM_PER_CTA_BARRIER, PTX_FIRST_PER_WORKER_BARRIER) (PTX_NUM_PER_WORKER_BARRIERS): Define. (nvptx_apply_dim_limits): Prevent vector_length 64 and num_workers 16. * plugin/plugin-nvptx.c (nvptx_exec): Prevent vector_length 64 and num_workers 16. From-SVN: r267838	2019-01-11 11:46:43 +00:00
Tom de Vries	2c372e81a9	[nvptx, libgomp] Don't launch with num_workers == 0 When using a compiler build with: ... +#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE +#define PTX_MAX_VECTOR_LENGTH PTX_CTA_SIZE ... and running the libgomp testsuite, we run into an execution failure in parallel-loop-1.c, due to a cuda launch failure: ... nvptx_exec: kernel f6_none_none$_omp_fn$0: launch gangs=480, workers=0, \ vectors=1024 libgomp: cuLaunchKernel error: invalid argument ... because workers == 0. The workers variable is set to 0 here in nvptx_exec: ... workers = blocks / actual_vectors; ... because actual_vectors is 1024, and blocks is 768: ... cuOccupancyMaxPotentialBlockSize: grid = 10, block = 768 ... Fix this by ensuring that workers is at least one. 2019-01-09 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Make sure to launch with at least one worker. From-SVN: r267746	2019-01-09 00:07:45 +00:00
Jakub Jelinek	a554497024	Update copyright years. From-SVN: r267494	2019-01-01 13:31:55 +01:00
Thomas Schwinge	f847198ec3	[PR88495] An OpenACC async queue is always synchronized with itself An OpenACC async queue is always synchronized with itself, so invocations like "#pragma acc wait(0) async(0)", or "acc_wait_async (0, 0)" don't make a lot of sense, but are still valid. libgomp/ PR libgomp/88495 * plugin/plugin-nvptx.c (nvptx_wait_async): Don't refuse "identical parameters". * testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/lib-80.c: Remove. From-SVN: r267152	2018-12-14 21:43:02 +01:00
Thomas Schwinge	1404af62dc	[PR88407] [OpenACC] Correctly handle unseen async-arguments ... which turn the operation into a no-op. libgomp/ PR libgomp/88407 * plugin/plugin-nvptx.c (nvptx_async_test, nvptx_wait) (nvptx_wait_async): Unseen async-argument is a no-op. * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise. * testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-71.c: Merge into... * testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this. Update. * testsuite/libgomp.oacc-c-c++-common/lib-77.c: Merge into... * testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this. Update From-SVN: r267150	2018-12-14 21:42:40 +01:00
Thomas Schwinge	18c247cc0b	[PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval Per my reading of the OpenACC specification (and as supported by secondary documentation, such as code examples, or presentations), it's valid to call "acc_get_cuda_stream"/"acc_set_cuda_stream" also with "acc_async_sync", "acc_async_noval" arguments, not just with the nonnegative values as currently implemented. libgomp/ PR libgomp/88370 * libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Clarify. * oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use "async_valid_p". * plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async == acc_async_sync". * testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file. * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update. * testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise. From-SVN: r267147	2018-12-14 21:42:08 +01:00
Cesar Philippidis	2049befdd0	[nvptx] Remove use of CUDA unified memory in libgomp libgomp/ * plugin/plugin-nvptx.c (struct cuda_map): New. (struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev, h_tail with (cuda_map *) map. (cuda_map_create): New function. (cuda_map_destroy): New function. (map_init): Update to use a linked list of cuda_map objects. (map_fini): Likewise. (map_pop): Likewise. (map_push): Likewise. Return CUdeviceptr instead of void. (init_streams_for_device): Remove stales references to ptx_stream members. (select_stream_for_async): Likewise. (nvptx_exec): Update call to map_init. From-SVN: r264397	2018-09-18 08:41:54 -07:00
Cesar Philippidis	bd9b3d3d1a	[nvptx] Use CUDA driver API to select default runtime launch geometry The CUDA driver API starting version 6.5 offers a set of runtime functions to calculate several occupancy-related measures, as a replacement for the occupancy calculator spreadsheet. This patch adds a heuristic for default runtime launch geometry, based on the new runtime function cuOccupancyMaxPotentialBlockSize. Build on x86_64 with nvptx accelerator and ran libgomp testsuite. 2018-08-13 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> PR target/85590 * plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef. (cuOccupancyMaxPotentialBlockSize): Declare. * plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define CUoccupancyB2DSize and declare cuOccupancyMaxPotentialBlockSize. (nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the default num_gangs and num_workers when the driver supports it. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r263505	2018-08-13 12:04:24 +00:00
Tom de Vries	8e09a12f01	[libgomp, nvptx] Fall back to cuLinkAddData/cuLinkCreate if _v2 not found Cuda driver api functions cuLinkAddData and cuLinkCreate are available starting version 5.5. In version 6.5, they are remapped onto _v2 versions. The dlopen interface of the libgomp nvptx plugin uses the _v2 versions, so it won't work with a cuda driver with driver api version lower than 6.5. This patch fixes the problem by testing for the presence of the _v2 versions, and falling back to the original versions in case of absence of the _v2 versions. Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and without --without-cuda-driver. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def (cuLinkAddData_v2, cuLinkCreate_v2): Declare using CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (cuLinkAddData, cuLinkCreate): Undef and declare. (cuLinkAddData_v2, cuLinkCreate_v2): Declare. (link_ptx): Fall back to cuLinkAddData/cuLinkCreate if the _v2 versions are not found. From-SVN: r263408	2018-08-08 14:26:37 +00:00
Tom de Vries	cedd9bd016	[libgomp, nvptx] Allow cuGetErrorString to be NULL Cuda driver api function cuGetErrorString is available in version 6.0 and higher. Currently, when the driver that is used does not contain this function, the libgomp nvptx plugin will not build (PLUGIN_NVPTX_DYNAMIC == 0) or run (PLUGIN_NVPTX_DYNAMIC == 1). This patch fixes this problem by testing for the presence of the function, and handling absence. Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and without --without-cuda-driver. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def (cuGetErrorString): Use CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (cuda_error): Handle if cuGetErrorString is not present. From-SVN: r263407	2018-08-08 14:26:28 +00:00
Tom de Vries	b113af959c	[libgomp, nvptx] Remove hard-coded const in nvptx_open_device CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR is defined in cuda driver api version 6.0 and higher. Currently nvptx_open_device uses a hard-coded constant instead. This patch fixes that by: - defining CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR to the hardcoded constant at toplevel, if not present in cuda.h, and - using CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR in nvptx_open_device Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR): Define. (nvptx_open_device): Use CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR. From-SVN: r263406	2018-08-08 14:26:19 +00:00
Tom de Vries	94767dacea	[libgomp, nvptx] Note that cuGetErrorString is in CUDA_VERSION >= 6000 Cuda driver api function cuGetErrorString is available in version 6.0 and higher. This patch: - removes a comment saying the declaration is not available in cuda.h 6.0 - fixes the presence test to use CUDA_VERSION < 6000 - moves the declaration to toplevel Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (cuda_error): Move declaration of cuGetErrorString ... (cuGetErrorString): ... here. Guard with CUDA_VERSION < 6000. From-SVN: r263405	2018-08-08 14:26:10 +00:00
Tom de Vries	02150de863	[libgomp, nvptx] Handle CUDA_ONE_CALL_MAYBE_NULL This patch adds handling of functions that may not be present in the cuda driver. Such a function can be declared using CUDA_ONE_CALL_MAYBE_NULL in cuda-lib.def, it can be called with the usual convenience macros, but before calling its presence needs to be tested using new macro CUDA_CALL_EXISTS. When using the dlopen interface (PLUGIN_NVPTX_DYNAMIC == 1), we allow non-present functions by allowing dlsym to return NULL. Otherwise (PLUGIN_NVPTX_DYNAMIC == 0) we declare the non-present function to be weak. Build and reg-tested libgomp on x86_64 with nvidia accelerator, with and without --disable-cuda-driver, in combination with a trigger patch that adds a non-existing function foo to cuda-lib.def: ... CUDA_ONE_CALL_MAYBE_NULL (foo) ... and declares it in plugin-nvptx.c: ... CUresult foo (void); ... and then uses it in nvptx_init after the init_cuda_lib call: ... if (CUDA_CALL_EXISTS (foo)) CUDA_CALL (foo); ... Also build and reg-tested on x86_64 with nvidia accelerator, with and without --disable-cuda-driver, in combination with a trigger patch that replaces all CUDA_ONE_CALLs in cuda-lib.def with CUDA_ONE_CALL_MAYBE_NULL, and guards two CUDA_CALLs with CUDA_CALL_EXISTS, one for a regular fn, and one for a fn that is a define in cuda/cuda.h. 2018-08-07 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (DO_PRAGMA): Define. (struct cuda_lib_s): Add def/undef of CUDA_ONE_CALL_MAYBE_NULL. (init_cuda_lib): Add new param to CUDA_ONE_CALL_1. Add arg to corresponding call in CUDA_ONE_CALL. Add def/undef of CUDA_ONE_CALL_MAYBE_NULL. (CUDA_CALL_EXISTS): Define. From-SVN: r263346	2018-08-06 22:13:56 +00:00
Tom de Vries	9e28b10779	[libgomp, nvptx] Minimize lifetime of CUDA_ONE_CALL defines This patch makes sure that the lifetimes of the CUDA_ONE_CALL macro (which is defined twice in plugin-nvptx.c) are minimized, to make it obvious that the definitions are used only in the lib-cuda.def include. Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-07 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (struct cuda_lib_s, init_cuda_lib): Put CUDA_ONE_CALL defines right before the cuda-lib.def include, and the corresponding undefs right after. From-SVN: r263345	2018-08-06 22:13:46 +00:00
Cesar Philippidis	094db6beb9	[PATCH] Remove use of 'struct map' from plugin (nvptx) libgomp/ * plugin/plugin-nvptx.c (struct map): Removed. (map_init, map_pop): Remove use of struct map. (map_push): Likewise and change argument list. * testsuite/libgomp.oacc-c-c++-common/mapping-1.c: New Co-Authored-By: James Norris <jnorris@codesourcery.com> From-SVN: r263212	2018-08-01 07:09:56 -07:00
Tom de Vries	8c6310a2c2	[libgomp, nvptx] Add cuda-lib.def 2018-08-01 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def: New file. Factor out of ... * plugin/plugin-nvptx.c (CUDA_CALLS): ... here. (struct cuda_lib_s, init_cuda_lib): Include cuda-lib.def instead of using CUDA_CALLS. From-SVN: r263208	2018-08-01 13:20:22 +00:00
Tom de Vries	4cdfee3f20	[libgomp, nvptx] Handle per-function max-threads-per-block in default dims Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the kernel launch configurations exceeds the resources available in the device, due to the default dimensions chosen by the runtime. This patch fixes that by taking the per-function max_threads_per_block into account when using the default dimensions. 2018-07-30 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (MIN, MAX): Redefine. (nvptx_exec): Ensure worker and vector default dims don't exceed targ_fn->max_threads_per_block. From-SVN: r263062	2018-07-30 08:17:26 +00:00
Tom de Vries	0b210c43bb	[libgomp, nvptx] Calculate default dims per device The default dimensions are calculated using per-device properties, but initialized once and used on all devices. This patch fixes this problem by introducing per-device default dimensions. 2018-07-30 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (struct ptx_device): Add default_dims field. (nvptx_open_device): Init default_dims for device. (nvptx_exec): Use default_dims from device. From-SVN: r263061	2018-07-30 08:17:16 +00:00
Cesar Philippidis	88a4654d03	[libgomp, nvptx] Add error with recompilation hint for launch failure Currently, when a kernel is lauched with too many workers, it results in a cuda launch failure. This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro M1200. This patch detects this situation, and errors out with a hint on how to fix it. Build and reg-tested on x86_64 with nvptx accelerator. 2018-07-26 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have sufficient resources to launch a kernel, and give a hint on how to fix it. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r262997	2018-07-26 11:42:29 +00:00
Cesar Philippidis	0c6c2f5fc2	[libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open Move sampling of device properties from nvptx_exec to nvptx_open, and assume the sampling always succeeds. This simplifies the default dimension initialization code in nvptx_open. 2018-07-26 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (struct ptx_device): Add warp_size, max_threads_per_block and max_threads_per_multiprocessor fields. (nvptx_open_device): Initialize new fields. (nvptx_exec): Use num_sms, and new fields. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r262996	2018-07-26 11:42:19 +00:00
Tom de Vries	ec00d3faf4	[openacc] Move GOMP_OPENACC_DIM parsing out of nvptx plugin 2018-05-02 Tom de Vries <tom@codesourcery.com> PR libgomp/85411 * plugin/plugin-nvptx.c (nvptx_exec): Move parsing of GOMP_OPENACC_DIM ... * env.c (parse_gomp_openacc_dim): ... here. New function. (initialize_env): Call parse_gomp_openacc_dim. (goacc_default_dims): Define. * libgomp.h (goacc_default_dims): Declare. * oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): New function. * oacc-plugin.h (GOMP_PLUGIN_acc_default_dim): Declare. * libgomp.map: New version "GOMP_PLUGIN_1.2". Add GOMP_PLUGIN_acc_default_dim. * testsuite/libgomp.oacc-c-c++-common/loop-default-runtime.c: New test. * testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test. From-SVN: r259852	2018-05-02 17:53:56 +00:00
Tom de Vries	df36a3d3be	[nvptx, libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin 2018-04-26 Tom de Vries <tom@codesourcery.com> PR libgomp/84020 * plugin/cuda/cuda.h (CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL. * plugin/plugin-nvptx.c (_GNU_SOURCE): Define. (process_GOMP_NVPTX_JIT): New function. (link_ptx): Use process_GOMP_NVPTX_JIT. From-SVN: r259678	2018-04-26 13:27:04 +00:00
Jakub Jelinek	85ec4feb11	Update copyright years. From-SVN: r256169	2018-01-03 11:03:58 +01:00
Tom de Vries	dfb15f6bbb	Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin 2017-06-27 Tom de Vries <tom@codesourcery.com> * plugin/plugin-nvptx.c (notify_var): New function. (nvptx_exec): Use notify_var for GOMP_OPENACC_DIM. From-SVN: r249695	2017-06-27 15:51:48 +00:00
Thomas Schwinge	78672bd8fd	libgomp nvptx plugin: Debugging output when disabling nvptx offloading libgomp/ * plugin/plugin-nvptx.c (nvptx_get_num_devices): Debugging output when disabling nvptx offloading. From-SVN: r248400	2017-05-24 08:59:05 +02:00
Jakub Jelinek	19929ba9c9	plugin-nvptx.c (cuda_lib_inited): Use signed char type instead of char. * plugin/plugin-nvptx.c (cuda_lib_inited): Use signed char type instead of char. From-SVN: r246918	2017-04-13 21:59:04 +02:00
Thomas Schwinge	e70ab10d5c	libgomp, nvptx plugin: Make "nvptx_exec" static libgomp/ * plugin/plugin-nvptx.c (nvptx_exec): Make it static. From-SVN: r245127	2017-02-02 15:35:30 +01:00
Thomas Schwinge	345a8c1712	libgomp: Normalize the names of a few functions of the libgomp plugin API libgomp/ * libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to GOMP_OFFLOAD_openacc_exec. Adjust all users. (GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to GOMP_OFFLOAD_openacc_cuda_get_current_device. Adjust all users. (GOMP_OFFLOAD_openacc_get_current_cuda_context): Rename to GOMP_OFFLOAD_openacc_cuda_get_current_context. Adjust all users. (GOMP_OFFLOAD_openacc_get_cuda_stream): Rename to GOMP_OFFLOAD_openacc_cuda_get_stream. Adjust all users. (GOMP_OFFLOAD_openacc_set_cuda_stream): Rename to GOMP_OFFLOAD_openacc_cuda_set_stream. Adjust all users. From-SVN: r245125	2017-02-02 15:13:57 +01:00
Jakub Jelinek	2393d337e7	configfrag.ac: For --without-cuda-driver don't initialize CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB. * plugin/configfrag.ac: For --without-cuda-driver don't initialize CUDA_DRIVER_INCLUDE nor CUDA_DRIVER_LIB. If both CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB are empty and linking small cuda program fails, define PLUGIN_NVPTX_DYNAMIC to 1 and use plugin/include/cuda as include dir and -ldl instead of -lcuda as library to link ptx plugin against. * plugin/plugin-nvptx.c: Include dlfcn.h if PLUGIN_NVPTX_DYNAMIC. (CUDA_CALLS): Define. (cuda_lib, cuda_lib_inited): New variables. (init_cuda_lib): New function. (CUDA_CALL_PREFIX): Define. (CUDA_CALL_ERET, CUDA_CALL_ASSERT): Use CUDA_CALL_PREFIX. (CUDA_CALL): Use FN instead of (FN). (CUDA_CALL_NOCHECK): Define. (cuda_error, fini_streams_for_device, select_stream_for_async, nvptx_attach_host_thread_to_device, nvptx_open_device, link_ptx, event_gc, nvptx_exec, nvptx_async_test, nvptx_async_test_all, nvptx_wait_all, nvptx_set_clocktick, GOMP_OFFLOAD_unload_image, nvptx_stacks_alloc, nvptx_stacks_free, GOMP_OFFLOAD_run): Use CUDA_CALL_NOCHECK. (nvptx_init): Call init_cuda_lib, if it fails, return false. Use CUDA_CALL_NOCHECK. (nvptx_get_num_devices): Call init_cuda_lib, if it fails, return 0. Use CUDA_CALL_NOCHECK. * plugin/cuda/cuda.h: New file. * config.h.in: Regenerated. * configure: Regenerated. From-SVN: r244522	2017-01-17 10:44:17 +01:00
Jakub Jelinek	cbe34bb5ed	Update copyright years. From-SVN: r243994	2017-01-01 13:07:43 +01:00
Alexander Monakov	6103184e81	OpenMP offloading to NVPTX: libgomp changes * Makefile.am (libgomp_la_SOURCES): Add atomic.c, icv.c, icv-device.c. * Makefile.in. Regenerate. * configure.ac [nvptx--] (libgomp_use_pthreads): Set and use it... (LIBGOMP_USE_PTHREADS): ...here; new define. configure: Regenerate. * config.h.in: Likewise. * config/posix/affinity.c: Move to... * affinity.c: ...here (new file). Guard use of Pthreads-specific interface by LIBGOMP_USE_PTHREADS. * critical.c: Split out GOMP_atomic_{start,end} into... * atomic.c: ...here (new file). * env.c: Split out ICV definitions into... * icv.c: ...here (new file) and... * icv-device.c: ...here. New file. * config/linux/lock.c (gomp_init_lock_30): Move to generic lock.c. (gomp_destroy_lock_30): Ditto. (gomp_set_lock_30): Ditto. (gomp_unset_lock_30): Ditto. (gomp_test_lock_30): Ditto. (gomp_init_nest_lock_30): Ditto. (gomp_destroy_nest_lock_30): Ditto. (gomp_set_nest_lock_30): Ditto. (gomp_unset_nest_lock_30): Ditto. (gomp_test_nest_lock_30): Ditto. * lock.c: New. * config/nvptx/lock.c: New. * config/nvptx/bar.c: New. * config/nvptx/bar.h: New. * config/nvptx/doacross.h: New. * config/nvptx/error.c: New. * config/nvptx/icv-device.c: New. * config/nvptx/mutex.h: New. * config/nvptx/pool.h: New. * config/nvptx/proc.c: New. * config/nvptx/ptrlock.h: New. * config/nvptx/sem.h: New. * config/nvptx/simple-bar.h: New. * config/nvptx/target.c: New. * config/nvptx/task.c: New. * config/nvptx/team.c: New. * config/nvptx/time.c: New. * config/posix/simple-bar.h: New. * libgomp.h: Guard pthread.h inclusion. Include simple-bar.h. (gomp_num_teams_var): Declare. (struct gomp_thread_pool): Change threads_dock member to gomp_simple_barrier_t. [__nvptx__] (gomp_thread): New implementation. (gomp_thread_attr): Guard by LIBGOMP_USE_PTHREADS. (gomp_thread_destructor): Ditto. (gomp_init_thread_affinity): Ditto. * team.c: Guard uses of Pthreads-specific interfaces by LIBGOMP_USE_PTHREADS. Adjust all uses of threads_dock. (gomp_free_thread) [__nvptx__]: Do not call 'free'. * config/nvptx/alloc.c: Delete. * config/nvptx/barrier.c: Ditto. * config/nvptx/fortran.c: Ditto. * config/nvptx/iter.c: Ditto. * config/nvptx/iter_ull.c: Ditto. * config/nvptx/loop.c: Ditto. * config/nvptx/loop_ull.c: Ditto. * config/nvptx/ordered.c: Ditto. * config/nvptx/parallel.c: Ditto. * config/nvptx/priority_queue.c: Ditto. * config/nvptx/sections.c: Ditto. * config/nvptx/single.c: Ditto. * config/nvptx/splay-tree.c: Ditto. * config/nvptx/work.c: Ditto. * testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass -foffload=-lgfortran in addition to -lgfortran. * testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto. * plugin/plugin-nvptx.c: Include <limits.h>. (struct targ_fn_descriptor): Add new fields. (struct ptx_device): Ditto. Set them... (nvptx_open_device): ...here. (nvptx_adjust_launch_bounds): New. (nvptx_host2dev): Allow NULL 'nvthd'. (nvptx_dev2host): Ditto. (GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400. (link_ptx): Adjust log sizes. (nvptx_host2dev): Allow NULL 'nvthd'. (nvptx_dev2host): Ditto. (nvptx_set_clocktick): New. Use it... (GOMP_OFFLOAD_load_image): ...here. Set new targ_fn_descriptor fields. (GOMP_OFFLOAD_dev2dev): New. (nvptx_adjust_launch_bounds): New. (nvptx_stacks_size): New. (nvptx_stacks_alloc): New. (nvptx_stacks_free): New. (GOMP_OFFLOAD_run): New. (GOMP_OFFLOAD_async_run): New (stub). Co-Authored-By: Dmitry Melnik <dm@ispras.ru> Co-Authored-By: Jakub Jelinek <jakub@redhat.com> From-SVN: r242789	2016-11-23 21:36:41 +03:00
Cesar Philippidis	6668eb4593	nvptx.c (PTX_GANG_DEFAULT): Set to zero. gcc/ * config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Set to zero. libgomp/ * plugin/plugin-nvptx.c (nvptx_exec): Interrogate board attributes to determine default geometry. * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Set gang dimension. Co-Authored-By: Nathan Sidwell <nathan@acm.org> From-SVN: r241803	2016-11-02 15:10:02 -07:00
Chung-Lin Tang	b4557008c4	oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter. 2016-05-26 Chung-Lin Tang <cltang@codesourcery.com> libgomp/ * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter. * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async' parameter, use to set async stream around call to gomp_unmap_vars, call gomp_unmap_vars() with 'do_copyfrom' set to true. * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field. (event_gc): Adjust event handling loop, collect PTX_EVT_ASYNC_CLEANUP events and call GOMP_PLUGIN_async_unmap_vars() for each of them. (event_add): Add int parameter, initialize 'val' field when adding new ptx_event struct. (nvptx_evec): Adjust event_add() call arguments. (nvptx_host2dev): Likewise. (nvptx_dev2host): Likewise. (nvptx_wait_async): Likewise. (nvptx_wait_all_async): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter, pass to event_add() call. * oacc-host.c (host_openacc_register_async_cleanup): Add 'int async' parameter. * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to call openacc.register_async_cleanup_func() hook. * oacc-parallel.c (GOACC_parallel_keyed): Likewise. * target.c (gomp_copy_from_async): Delete function. (gomp_map_vars): Remove async_refcount. (gomp_unmap_vars): Likewise. (gomp_load_image_to_device): Likewise. (omp_target_associate_ptr): Likewise. * libgomp.h (struct splay_tree_key_s): Remove async_refcount. (acc_dispatch_t.register_async_cleanup_func): Add int parameter. (gomp_copy_from_async): Remove. From-SVN: r236772	2016-05-26 13:28:25 +00:00
Chung-Lin Tang	6ce1307231	target.c (gomp_device_copy): New function. libgomp/ 2016-05-26 Chung-Lin Tang <cltang@codesourcery.com> * target.c (gomp_device_copy): New function. (gomp_copy_host2dev): Likewise. (gomp_copy_dev2host): Likewise. (gomp_free_device_memory): Likewise. (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev. (gomp_map_pointer): Likewise. (gomp_map_vars): Adjust to call gomp_copy_host2dev, handle NULL value from alloc_func plugin hook. (gomp_unmap_tgt): Adjust to call gomp_free_device_memory. (gomp_copy_from_async): Adjust to call gomp_copy_dev2host. (gomp_unmap_vars): Likewise. (gomp_update): Adjust to call gomp_copy_dev2host and gomp_copy_host2dev functions. (gomp_unload_image_from_device): Handle false value from unload_image_func plugin hook. (gomp_init_device): Handle false value from init_device_func plugin hook. (gomp_exit_data): Adjust to call gomp_copy_dev2host. (omp_target_free): Adjust to call gomp_free_device_memory. (omp_target_memcpy): Handle return values from host2dev_func, dev2host_func, and dev2dev_func plugin hooks. (omp_target_memcpy_rect_worker): Likewise. (gomp_target_fini): Handle false value from fini_device_func plugin hook. * libgomp.h (struct gomp_device_descr): Adjust return type of init_device_func, fini_device_func, unload_image_func, free_func, dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'. * oacc-init.c (acc_shutdown_1): Handle false value from fini_device_func plugin hook. * oacc-host.c (host_init_device): Change return type to bool. (host_fini_device): Likewise. (host_unload_image): Likewise. (host_free): Likewise. (host_dev2host): Likewise. (host_host2dev): Likewise. * oacc-mem.c (acc_free): Handle plugin hook fatal error case. (acc_memcpy_to_device): Likewise. (acc_memcpy_from_device): Likewise. (delete_copyout): Add libfnname parameter, handle free_func hook fatal error case. (acc_delete): Adjust delete_copyout call. (acc_copyout): Likewise. (update_dev_host): Move gomp_mutex_unlock to after host2dev/dev2host hook calls. * plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable to 'hsa_error_msg', for clarity. (hsa_fatal): Likewise. (hsa_error): New function. (init_hsa_context): Change return type to bool, adjust to return false on error. (GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context return value. (GOMP_OFFLOAD_init_device): Change return type to bool, adjust to return false on error. (get_agent_info): Adjust to return NULL on error. (destroy_hsa_program): Change return type to bool, adjust to return false on error. (GOMP_OFFLOAD_load_image): Adjust to return -1 on error. (destroy_module): Change return type to bool, adjust to return false on error. (GOMP_OFFLOAD_unload_image): Likewise. (GOMP_OFFLOAD_fini_device): Likewise. (GOMP_OFFLOAD_alloc): Change to return NULL when called. (GOMP_OFFLOAD_free): Change to return false when called. (GOMP_OFFLOAD_dev2host): Likewise. (GOMP_OFFLOAD_host2dev): Likewise. (GOMP_OFFLOAD_dev2dev): Likewise. * plugin/plugin-nvptx.c (CUDA_CALL_ERET): New convenience macro. (CUDA_CALL): Likewise. (CUDA_CALL_ASSERT): Likewise. (map_init): Change return type to bool, use CUDA_CALL* macros. (map_fini): Likewise. (init_streams_for_device): Change return type to bool, adjust call to map_init. (fini_streams_for_device): Change return type to bool, adjust call to map_fini. (select_stream_for_async): Release stream_lock before calls to GOMP_PLUGIN_fatal, adjust call to map_init. (nvptx_init): Use CUDA_CALL* macros. (nvptx_attach_host_thread_to_device): Change return type to bool, use CUDA_CALL* macros. (nvptx_open_device): Use CUDA_CALL* macros. (nvptx_close_device): Change return type to bool, use CUDA_CALL* macros. (nvptx_get_num_devices): Use CUDA_CALL* macros. (link_ptx): Change return type to bool, use CUDA_CALL* macros. (nvptx_exec): Use CUDA_CALL* macros. (nvptx_alloc): Use CUDA_CALL* macros. (nvptx_free): Change return type to bool, use CUDA_CALL* macros. (nvptx_host2dev): Likewise. (nvptx_dev2host): Likewise. (nvptx_wait): Use CUDA_CALL* macros. (nvptx_wait_async): Likewise. (nvptx_wait_all): Likewise. (nvptx_wait_all_async): Likewise. (nvptx_set_cuda_stream): Adjust order of stream_lock acquire, use CUDA_CALL* macros, adjust call to map_fini. (GOMP_OFFLOAD_init_device): Change return type to bool, adjust code accordingly. (GOMP_OFFLOAD_fini_device): Likewise. (GOMP_OFFLOAD_load_image): Adjust calls to nvptx_attach_host_thread_to_device/link_ptx to handle errors, use CUDA_CALL* macros. (GOMP_OFFLOAD_unload_image): Change return type to bool, adjust return code. (GOMP_OFFLOAD_alloc): Adjust calls to code to handle error return. (GOMP_OFFLOAD_free): Change return type to bool, adjust calls to handle error return. (GOMP_OFFLOAD_dev2host): Likewise. (GOMP_OFFLOAD_host2dev): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Use CUDA_CALL* macros. (GOMP_OFFLOAD_openacc_create_thread_data): Likewise. liboffloadmic/ 2016-05-26 Chung-Lin Tang <cltang@codesourcery.com> * plugin/libgomp-plugin-intelmic.cpp (offload): Change return type to bool, adjust return code. (GOMP_OFFLOAD_init_device): Likewise. (GOMP_OFFLOAD_fini_device): Likewise. (get_target_table): Likewise. (offload_image): Likwise. (GOMP_OFFLOAD_load_image): Adjust call to offload_image(), change to return -1 on error. (GOMP_OFFLOAD_unload_image): Change return type to bool, adjust return code. (GOMP_OFFLOAD_alloc): Likewise. (GOMP_OFFLOAD_free): Likewise. (GOMP_OFFLOAD_host2dev): Likewise. (GOMP_OFFLOAD_dev2host): Likewise. (GOMP_OFFLOAD_dev2dev): Likewise. From-SVN: r236768	2016-05-26 09:58:56 +00:00
Alexander Monakov	c2bd3b6911	libgomp nvptx plugin: make cuMemFreeHost error non-fatal From-SVN: r235339	2016-04-21 16:11:47 +03:00
Thomas Schwinge	f99c355797	Use plain -fopenacc to enable OpenACC kernels processing gcc/ * tree-parloops.c (create_parallel_loop, gen_parallel_loop) (parallelize_loops): In OpenACC kernels mode, set n_threads to zero. (pass_parallelize_loops::gate): In OpenACC kernels mode, gate on flag_openacc. * tree-ssa-loop.c (gate_oacc_kernels): Likewise. gcc/testsuite/ * c-c++-common/goacc/kernels-counter-vars-function-scope.c: Adjust to -ftree-parallelize-loops/-fopenacc changes. * c-c++-common/goacc/kernels-double-reduction-n.c: Likewise. * c-c++-common/goacc/kernels-double-reduction.c: Likewise. * c-c++-common/goacc/kernels-loop-2.c: Likewise. * c-c++-common/goacc/kernels-loop-3.c: Likewise. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise. * c-c++-common/goacc/kernels-loop-n.c: Likewise. * c-c++-common/goacc/kernels-loop-nest.c: Likewise. * c-c++-common/goacc/kernels-loop.c: Likewise. * c-c++-common/goacc/kernels-one-counter-var.c: Likewise. * c-c++-common/goacc/kernels-reduction.c: Likewise. * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise. * gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise. libgomp/ * oacc-parallel.c (GOACC_parallel_keyed): Initialize dims. * plugin/plugin-nvptx.c (nvptx_exec): Provide default values for dims. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Adjust to -ftree-parallelize-loops/-fopenacc changes. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c: Likewise. From-SVN: r233634	2016-02-23 16:07:54 +01:00
Alexander Monakov	0d58938ed7	nvptx plugin: do not force JIT target SM version When link_ptx runs, a CUDA device is already bound to current thread, so the driver library knows the target architecture. There isn't any benefit from forcing a specific target here; on the contrary, hardcoding sm_30 breaks offloading on later (Maxwell, sm_5x) devices. * plugin/plugin-nvptx.c (link_ptx): Do not set CU_JIT_TARGET. From-SVN: r232227	2016-01-11 15:55:31 +03:00
Jakub Jelinek	818ab71a41	Update copyright years. From-SVN: r232055	2016-01-04 15:30:50 +01:00

1 2

65 Commits