OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Andrew Stubbs	f062c3f115	amdgcn: Switch to HSACO v3 binary format This upgrades the compiler to emit HSA Code Object v3 binaries. This means changing the assembler directives, and linker command line options. The gcn-run and libgomp loaders need corresponding alterations. The relocations no longer need to be fixed up manually, and the kernel symbol names have changed slightly. This move makes the binaries compatible with the new rocgdb from ROCm 3.5. 2020-06-17 Andrew Stubbs <ams@codesourcery.com> gcc/ * config/gcn/gcn-hsa.h (TEXT_SECTION_ASM_OP): Use ".text". (BSS_SECTION_ASM_OP): Use ".bss". (ASM_SPEC): Remove "-mattr=-code-object-v3". (LINK_SPEC): Add "--export-dynamic". * config/gcn/gcn-opts.h (processor_type): Replace PROCESSOR_VEGA with PROCESSOR_VEGA10 and PROCESSOR_VEGA20. * config/gcn/gcn-run.c (HSA_RUNTIME_LIB): Use ".so.1" variant. (load_image): Remove obsolete relocation handling. Add ".kd" suffix to the symbol names. * config/gcn/gcn.c (MAX_NORMAL_SGPR_COUNT): Set to 62. (gcn_option_override): Update gcn_isa test. (gcn_kernel_arg_types): Update all the assembler directives. Remove the obsolete options. (gcn_conditional_register_usage): Update MAX_NORMAL_SGPR_COUNT usage. (gcn_omp_device_kind_arch_isa): Handle PROCESSOR_VEGA10 and PROCESSOR_VEGA20. (output_file_start): Rework assembler file header. (gcn_hsa_declare_function_name): Rework kernel metadata. * config/gcn/gcn.h (GCN_KERNEL_ARG_TYPES): Set to 16. * config/gcn/gcn.opt (PROCESSOR_VEGA): Remove enum. (PROCESSOR_VEGA10): New enum value. (PROCESSOR_VEGA20): New enum value. libgomp/ * plugin/plugin-gcn.c (init_environment_variables): Use ".so.1" variant for HSA_RUNTIME_LIB name. (find_executable_symbol_1): Delete. (find_executable_symbol): Delete. (init_kernel_properties): Add ".kd" suffix to symbol names. (find_load_offset): Delete. (create_and_finalize_hsa_program): Remove relocation handling.	2020-06-17 10:06:21 +01:00
Andrew Stubbs	966de09be9	amdgcn: Check HSA return codes [PR94629] Ensure that the returned status values are not ignored. The old code was not broken, but this is both safer and satisfies static analysis. 2020-04-23 Andrew Stubbs <ams@codesourcery.com> PR other/94629 libgomp/ * plugin/plugin-gcn.c (init_hsa_context): Check return value from hsa_iterate_agents. (GOMP_OFFLOAD_init_device): Check return values from both calls to hsa_agent_iterate_regions.	2020-04-23 15:05:23 +01:00
Frederik Harwath	001ab12e62	openmp: ignore nowait if async execution is unsupported [PR93481] An OpenMP "nowait" clause on a target construct currently leads to a call to GOMP_OFFLOAD_async_run in the plugin that is used for offloading at execution time. The nvptx plugin contains only a stub of this function that always produces a fatal error if called. This commit changes the "nowait" implementation to ignore the clause if the executing device's plugin does not implement GOMP_OFFLOAD_async_run. The stub in the nvptx plugin is removed which effectively means that programs containing "nowait" can now be executed with nvptx offloading as if the clause had not been used. This behavior is consistent with the OpenMP specification which says that "[...] execution of the target task may be deferred" (emphasis added), cf. OpenMP 5.0, page 172. libgomp/ * plugin/plugin-nvptx.c: Remove GOMP_OFFLOAD_async_run stub. * target.c (gomp_load_plugin_for_device): Make "async_run" loading optional. (gomp_target_task_fn): Assert "devicep->async_run_func". (clear_unsupported_flags): New function to remove unsupported flags (right now only GOMP_TARGET_FLAG_NOWAIT) that can be be ignored. (GOMP_target_ext): Apply clear_unsupported_flags to flags. * testsuite/libgomp.c/target-33.c: Remove xfail for offload_target_nvptx. * testsuite/libgomp.c/target-34.c: Likewise.	2020-02-13 10:18:31 +01:00
Andrew Stubbs	591f869ad7	Remove gfx801 "carrizo" support 2020-02-03 Andrew Stubbs <ams@codesourcery.com> gcc/ * config.gcc: Remove "carrizo" support. * config/gcn/gcn-opts.h (processor_type): Likewise. * config/gcn/gcn.c (gcn_omp_device_kind_arch_isa): Likewise. * config/gcn/gcn.opt (gpu_type): Likewise. * config/gcn/t-omp-device: Likewise. libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH_AMDGCN_GFX801): Remove. (gcn_gfx801_s): Remove. (isa_hsa_name): Remove gfx801. (isa_gcc_name): Remove gfx801/carizzo. (isa_code): Remove gfx801.	2020-02-03 17:23:18 +00:00
Kwok Cheung Yeung	5a28e2727f	[amdgcn] Scale number of threads/workers with VGPR usage 2020-01-31 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/ * config/gcn/mkoffload.c (process_asm): Add sgpr_count and vgpr_count to definition of hsa_kernel_description. Parse assembly to find SGPR and VGPR count of kernel and store in hsa_kernel_description. libgomp/ * plugin/plugin-gcn.c (struct hsa_kernel_description): Add sgpr_count and vgpr_count fields. (struct kernel_info): Add a field for a hsa_kernel_description. (run_kernel): Reduce the number of threads/workers if the requested number would require too many VGPRs. (init_basic_kernel_info): Initialize description field with the hsa_kernel_description entry for the kernel.	2020-01-31 07:13:05 -08:00
Tobias Burnus	5ab5d81b36	Skip plugin-{gcn,hsa} for (-m)x32 (PR bootstrap/93409) PR bootstrap/93409 * plugin/configfrag.ac (enable_offload_targets): Skip HSA and GCN plugin besides -m32 also for -mx32. * configure: Regenerate.	2020-01-30 12:27:17 +01:00
Frederik Harwath	2e5ea57959	Add OpenACC acc_get_property support for AMD GCN Add full support for the OpenACC 2.6 acc_get_property and acc_get_property_string functions to the libgomp GCN plugin. libgomp/ * plugin-gcn.c (struct agent_info): Add fields "name" and "vendor_name" ... (GOMP_OFFLOAD_init_device): ... and init from here. (struct hsa_context_info): Add field "driver_version_s" ... (init_hsa_contest): ... and init from here. (GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper implementation. * testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: Enable test execution for amdgcn and host offloading targets. * testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c (expect_device_properties): Split function into ... (expect_device_string_properties): ... this new function ... (expect_device_memory): ... and this new function. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c: Add test.	2020-01-29 11:54:56 +01:00
Andrew Stubbs	14e5e74698	Fix libgomp plugin-gcn bug 2020-01-23 Andrew Stubbs <ams@codesourcery.com> libgomp/ * plugin/plugin-gcn.c (parse_target_attributes): Use correct mask for the device id.	2020-01-23 12:40:04 +00:00
Frederik Harwath	7d593fd672	Add runtime ISA check for amdgcn offloading The HSA/ROCm runtime rejects binaries not built for the exact GPU device present. So far, the libgomp amdgcn plugin does not verify that the GPU ISA and the ISA specified at compile time match before handing over the binary to the runtime. In case of a mismatch, the user is confronted with an unhelpful runtime error. This commit implements a runtime ISA check. In case of an ISA mismatch, the execution is aborted with a clear error message and a hint at the correct compilation parameters for the GPU on which the execution has been attempted. libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH): New enum. * (EF_AMDGPU_MACH_MASK): New constant. * (gcn_isa): New typedef. * (gcn_gfx801_s): New constant. * (gcn_gfx803_s): New constant. * (gcn_gfx900_s): New constant. * (gcn_gfx906_s): New constant. * (gcn_isa_name_len): New constant. * (elf_gcn_isa_field): New function. * (isa_hsa_name): New function. * (isa_gcc_name): New function. * (isa_code): New function. * (struct agent_info): Add field "device_isa" and remove field "gfx900_p". * (GOMP_OFFLOAD_init_device): Adapt agent init to "agent_info" field changes, fail if device has unknown ISA. * (parse_target_attributes): Replace "gfx900_p" by "device_isa". * (isa_matches_agent): New function ... * (create_and_finalize_hsa_program): ... used from here to check that the GPU ISA and the code-object ISA match.	2020-01-21 07:41:45 +01:00
Thomas Schwinge	6fc0385c0c	OpenACC 'acc_get_property' cleanup include/ * gomp-constants.h (enum gomp_device_property): Remove. libgomp/ * libgomp-plugin.h (enum goacc_property): New. Adjust all users to use this instead of 'enum gomp_device_property'. (GOMP_OFFLOAD_get_property): Rename to... (GOMP_OFFLOAD_openacc_get_property): ... this. Adjust all users. * libgomp.h (struct gomp_device_descr): Move 'GOMP_OFFLOAD_openacc_get_property'... (struct acc_dispatch_t): ... here. Adjust all users. * plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): Remove. liboffloadmic/ * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property): Remove. From-SVN: r280150	2020-01-10 23:24:36 +01:00
Jakub Jelinek	8d9254fc8a	Update copyright years. From-SVN: r279813	2020-01-01 12:51:42 +01:00
Jakub Jelinek	e9dcb75e40	re PR bootstrap/93074 (build FAIL with --enable-offload-targets=nvptx-none) PR bootstrap/93074 * plugin/cuda/cuda.h (cuDeviceGetName, cuDriverGetVersion): Declare. (cuDeviceTotalMem, cuMemGetInfo): Likewise. Define to *_v2. From-SVN: r279747	2019-12-28 10:26:03 +01:00
Maciej W. Rozycki	6c84c8bf9b	Add OpenACC 2.6 `acc_get_property' support Add generic support for the OpenACC 2.6 `acc_get_property' and `acc_get_property_string' routines, as well as full handlers for the host and the NVPTX offload targets and minimal handlers for the HSA, Intel MIC, and AMD GCN offload targets. Included are C/C++ and Fortran tests that, in particular, print the property values for acc_property_vendor, acc_property_memory, acc_property_free_memory, acc_property_name, and acc_property_driver. The output looks as follows: Vendor: GNU Name: GOMP Total memory: 0 Free memory: 0 Driver: 1.0 with the host driver (where the memory related properties are not supported for the host device and yield 0, conforming to the standard) and output like: Vendor: Nvidia Total memory: 12651462656 Free memory: 12202737664 Name: TITAN V Driver: CUDA Driver 9.1 with the NVPTX driver. 2019-12-22 Maciej W. Rozycki <macro@codesourcery.com> Frederik Harwath <frederik@codesourcery.com> Thomas Schwinge <tschwinge@codesourcery.com> include/ * gomp-constants.h (gomp_device_property): New enum. libgomp/ * libgomp.h (gomp_device_descr): Add `get_property_func' member. * libgomp-plugin.h (gomp_device_property_value): New union. (gomp_device_property_value): New prototype. * openacc.h (acc_device_t): Add `acc_device_current' enumeration constant. (acc_device_property_t): New enum. (acc_get_property, acc_get_property_string): New prototypes. * oacc-init.c (acc_get_device_type): Also assert that result is not `acc_device_current'. (get_property_any, acc_get_property, acc_get_property_string): New functions. * openacc.f90 (openacc_kinds): Add `acc_device_current' and `acc_property_memory', `acc_property_free_memory', `acc_property_name', `acc_property_vendor' and `acc_property_driver' constants. Add `acc_device_property' data type. (openacc_internal): Add `acc_get_property' and `acc_get_property_string' interfaces. Add `acc_get_property_h', `acc_get_property_string_h', `acc_get_property_l' and `acc_get_property_string_l'. * oacc-host.c (host_get_property): New function. (host_dispatch): Wire it. * target.c (gomp_load_plugin_for_device): Handle `get_property'. * libgomp.map (OACC_2.6): Add `acc_get_property', `acc_get_property_h_', `acc_get_property_string' and `acc_get_property_string_h_' symbols. * libgomp.texi (OpenACC Runtime Library Routines): Add `acc_get_property'. (acc_get_property): New node. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_property): New function (stub). * plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): New function. * plugin/plugin-nvptx.c (CUDA_CALLS): Add `cuDeviceGetName', `cuDeviceTotalMem', `cuDriverGetVersion' and `cuMemGetInfo' calls. (GOMP_OFFLOAD_get_property): New function. (struct ptx_device): Add new field "name". (cuda_driver_version_s): Add new static variable ... (nvptx_init): ... and init from here. * testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c: New file with test helper functions. * testsuite/libgomp.oacc-fortran/acc_get_property.f90: New test. liboffloadmic/ * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property): New function. Reviewed-by: Thomas Schwinge <thomas@codesourcery.com> Co-Authored-By: Frederik Harwath <frederik@codesourcery.com> Co-Authored-By: Thomas Schwinge <tschwinge@codesourcery.com> From-SVN: r279710	2019-12-22 19:54:09 +00:00
Tobias Burnus	93d9021987	libgomp – spelling fixes, incl. omp_lib.h.in * omp_lib.h.in: Fix spelling of function declaration omp_get_cancell(l)ation. * libgomp.texi (acc_is_present, acc_async_test, acc_async_test_all): Fix typos. * env.c: Fix comment typos. * oacc-host.c: Likewise. * ordered.c: Likewise. * task.c: Likewise. * team.c: Likewise. * config/gcn/task.c: Likewise. * config/gcn/team.c: Likewise. * config/nvptx/task.c: Likewise. * config/nvptx/team.c: Likewise. * plugin/plugin-gcn.c: Likewise. * testsuite/libgomp.fortran/jacobi.f: Likewise. * testsuite/libgomp.hsa.c/tiling-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/enter_exit-lib.c: Likewise. From-SVN: r279218	2019-12-11 12:45:49 +01:00
Julian Brown	d88b27daa1	AMD GCN libgomp plugin queue-full condition locking fix libgomp/ * plugin/plugin-gcn.c (wait_for_queue_nonfull): Don't lock/unlock aq->mutex here. (queue_push_launch): Lock aq->mutex before calling wait_for_queue_nonfull. (queue_push_callback): Likewise. (queue_push_asyncwait): Likewise. (queue_push_placeholder): Likewise. Reviewed-by: Andrew Stubbs <ams@codesourcery.com> From-SVN: r278517	2019-11-20 17:56:30 +00:00
Julian Brown	8d2f4ddfd7	Fix host-to-device copies from rodata for AMD GCN libgomp/ * plugin/plugin-gcn.c (hsa_memory_copy_wrapper): New. (copy_data, GOMP_OFFLOAD_host2dev): Use above function. (GOMP_OFFLOAD_dev2host, GOMP_OFFLOAD_dev2dev): Check hsa_memory_copy return code. Reviewed-by: Andrew Stubbs <ams@codesourcery.com> From-SVN: r278516	2019-11-20 17:53:31 +00:00
Andrew Stubbs	237957cc2c	GCN Libgomp Plugin 2019-11-13 Andrew Stubbs <ams@codesourcery.com> Kwok Cheung Yeung <kcy@codesourcery.com> Julian Brown <julian@codesourcery.com> Tom de Vries <tom@codesourcery.com> libgomp/ * plugin/Makefrag.am: Add amdgcn plugin support. * plugin/configfrag.ac: Likewise. * plugin/plugin-gcn.c: New file. * configure: Regenerate. * Makefile.in: Regenerate. * testsuite/Makefile.in: Regenerate. Co-Authored-By: Julian Brown <julian@codesourcery.com> Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Tom de Vries <tom@codesourcery.com> From-SVN: r278138	2019-11-13 12:38:18 +00:00
Andrew Stubbs	d2903ce05b	Add device number to GOMP_OFFLOAD_openacc_async_construct 2019-11-13 Andrew Stubbs <ams@codesourcery.com> Julian Brown <julian@codesourcery.com> libgomp/ * libgomp-plugin.h (GOMP_OFFLOAD_openacc_async_construct): Add int parameter. * oacc-async.c (lookup_goacc_asyncqueue): Pass device number to the queue constructor. * oacc-host.c (host_openacc_async_construct): Add device parameter. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_openacc_async_construct): Add device parameter. Co-Authored-By: Julian Brown <julian@codesourcery.com> From-SVN: r278134	2019-11-13 12:37:59 +00:00
Jakub Jelinek	810f316dd6	configure.ac: Remove GCC_HEADER_STDINT(gstdint.h). * configure.ac: Remove GCC_HEADER_STDINT(gstdint.h). * libgomp.h: Include <stdint.h> instead of "gstdint.h". * oacc-parallel.c: Don't include "libgomp_g.h". * plugin/plugin-hsa.c: Include <stdint.h> instead of "gstdint.h". * plugin/plugin-nvptx.c: Don't include "gstdint.h". * aclocal.m4: Regenerated. * config.h.in: Regenerated. * configure: Regenerated. * Makefile.in: Regenerated. From-SVN: r276389	2019-10-01 09:51:46 +02:00
Tobias Burnus	c28712beb4	libgomp plugin - init string libgomp/ 2019-09-13 Tobias Burnus <tobias@codesourcery.com> * plugin/plugin-hsa.c (hsa_warn, hsa_fatal, hsa_error): Ensure string is initialized. From-SVN: r275703	2019-09-13 20:14:02 +02:00
Jakub Jelinek	b5c26449f3	re PR libgomp/90585 (libgomp hsa plugin ftbfs in the x32 multilib variant) PR libgomp/90585 * plugin/plugin-hsa.c: Include gstdint.h. Include inttypes.h only if HAVE_INTTYPES_H is defined. (print_uint64_t): New typedef. (PRIu64): Define if HAVE_INTTYPES_H is not defined. (print_kernel_dispatch, run_kernel): Use PRIu64 macro instead of "lu", cast uint64_t HSA_DEBUG and fprintf arguments to print_uint64_t. (release_kernel_dispatch): Likewise. Cast shadow->debug to uintptr_t before casting to void . plugin/plugin-nvptx.c: Include gstdint.h instead of stdint.h. * oacc-mem.c: Don't include config.h nor stdint.h. * target.c: Don't include config.h. * oacc-cuda.c: Likewise. * oacc-host.c: Don't include stdint.h. From-SVN: r271597	2019-05-24 10:59:37 +02:00
Thomas Schwinge	5fae049dc2	OpenACC Profiling Interface (incomplete) libgomp/ * acc_prof.h: New file. * oacc-profiling.c: Likewise. * Makefile.am (nodist_libsubinclude_HEADERS, libgomp_la_SOURCES): Add these, respectively. * Makefile.in: Regenerate. * env.c (initialize_env): Call goacc_profiling_initialize. * oacc-plugin.c (GOMP_PLUGIN_goacc_thread) (GOMP_PLUGIN_goacc_profiling_dispatch): New functions. * oacc-plugin.h (GOMP_PLUGIN_goacc_thread) (GOMP_PLUGIN_goacc_profiling_dispatch): Declare. * libgomp.map (OACC_2.5.1): Add acc_prof_lookup, acc_prof_register, acc_prof_unregister, and acc_register_library. (GOMP_PLUGIN_1.3): Add GOMP_PLUGIN_goacc_profiling_dispatch, and GOMP_PLUGIN_goacc_thread. * oacc-int.h (struct goacc_thread): Add prof_info, api_info, prof_callbacks_enabled members. (goacc_prof_enabled, goacc_profiling_initialize) (_goacc_profiling_dispatch_p, _goacc_profiling_setup_p) (goacc_profiling_dispatch): Declare. (GOACC_PROF_ENABLED, GOACC_PROFILING_DISPATCH_P) (GOACC_PROFILING_SETUP_P): Define. * oacc-async.c (acc_async_test, acc_async_test_all, acc_wait) (acc_wait_async, acc_wait_all, acc_wait_all_async): Update for OpenACC Profiling Interface. * oacc-cuda.c (acc_get_current_cuda_device) (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Likewise. * oacc-init.c (acc_init_1, goacc_attach_host_thread_to_device) (acc_init, acc_set_device_type, acc_get_device_type) (acc_get_device_num, goacc_lazy_initialize): Likewise. * oacc-mem.c (acc_malloc, acc_free, memcpy_tofrom_device) (acc_deviceptr, acc_hostptr, acc_is_present, acc_map_data) (acc_unmap_data, present_create_copy, delete_copyout) (update_dev_host): Likewise. * oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start) (GOACC_data_end, GOACC_enter_exit_data, GOACC_update, GOACC_wait): Likewise. * plugin/plugin-nvptx.c (nvptx_exec, nvptx_alloc, nvptx_free) (GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec): Likewise. * libgomp.texi: Update. * testsuite/libgomp.oacc-c-c++-common/acc_prof-dispatch-1.c: New file. * testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-valid_bytes-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-version-1.c: Likewise. From-SVN: r271346	2019-05-17 21:13:36 +02:00
Chung-Lin Tang	1f4c5b9bb2	2019-05-13 Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Thomas Schwinge <thomas@codesourcery.com> libgomp/ * libgomp-plugin.h (struct goacc_asyncqueue): Declare. (struct goacc_asyncqueue_list): Likewise. (goacc_aq): Likewise. (goacc_aq_list): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove. (GOMP_OFFLOAD_openacc_async_test): Remove. (GOMP_OFFLOAD_openacc_async_test_all): Remove. (GOMP_OFFLOAD_openacc_async_wait): Remove. (GOMP_OFFLOAD_openacc_async_wait_async): Remove. (GOMP_OFFLOAD_openacc_async_wait_all): Remove. (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove. (GOMP_OFFLOAD_openacc_async_set_async): Remove. (GOMP_OFFLOAD_openacc_exec): Adjust declaration. (GOMP_OFFLOAD_openacc_cuda_get_stream): Likewise. (GOMP_OFFLOAD_openacc_cuda_set_stream): Likewise. (GOMP_OFFLOAD_openacc_async_exec): Declare. (GOMP_OFFLOAD_openacc_async_construct): Declare. (GOMP_OFFLOAD_openacc_async_destruct): Declare. (GOMP_OFFLOAD_openacc_async_test): Declare. (GOMP_OFFLOAD_openacc_async_synchronize): Declare. (GOMP_OFFLOAD_openacc_async_serialize): Declare. (GOMP_OFFLOAD_openacc_async_queue_callback): Declare. (GOMP_OFFLOAD_openacc_async_host2dev): Declare. (GOMP_OFFLOAD_openacc_async_dev2host): Declare. * libgomp.h (struct acc_dispatch_t): Define 'async' sub-struct. (gomp_acc_insert_pointer): Adjust declaration. (gomp_copy_host2dev): New declaration. (gomp_copy_dev2host): Likewise. (gomp_map_vars_async): Likewise. (gomp_unmap_tgt): Likewise. (gomp_unmap_vars_async): Likewise. (gomp_fini_device): Likewise. * oacc-async.c (get_goacc_thread): New function. (get_goacc_thread_device): New function. (lookup_goacc_asyncqueue): New function. (get_goacc_asyncqueue): New function. (acc_async_test): Adjust code to use new async design. (acc_async_test_all): Likewise. (acc_wait): Likewise. (acc_wait_async): Likewise. (acc_wait_all): Likewise. (acc_wait_all_async): Likewise. (goacc_async_free): New function. (goacc_init_asyncqueues): Likewise. (goacc_fini_asyncqueues): Likewise. * oacc-cuda.c (acc_get_cuda_stream): Adjust code to use new async design. (acc_set_cuda_stream): Likewise. * oacc-host.c (host_openacc_exec): Adjust parameters, remove 'async'. (host_openacc_register_async_cleanup): Remove. (host_openacc_async_exec): New function. (host_openacc_async_test): Adjust parameters. (host_openacc_async_test_all): Remove. (host_openacc_async_wait): Remove. (host_openacc_async_wait_async): Remove. (host_openacc_async_wait_all): Remove. (host_openacc_async_wait_all_async): Remove. (host_openacc_async_set_async): Remove. (host_openacc_async_synchronize): New function. (host_openacc_async_serialize): New function. (host_openacc_async_host2dev): New function. (host_openacc_async_dev2host): New function. (host_openacc_async_queue_callback): New function. (host_openacc_async_construct): New function. (host_openacc_async_destruct): New function. (struct gomp_device_descr host_dispatch): Remove initialization of old interface, add intialization of new async sub-struct. * oacc-init.c (acc_shutdown_1): Adjust to use gomp_fini_device. (goacc_attach_host_thread_to_device): Remove old async code usage. * oacc-int.h (goacc_init_asyncqueues): New declaration. (goacc_fini_asyncqueues): Likewise. (goacc_async_copyout_unmap_vars): Likewise. (goacc_async_free): Likewise. (get_goacc_asyncqueue): Likewise. (lookup_goacc_asyncqueue): Likewise. * oacc-mem.c (memcpy_tofrom_device): Adjust code to use new async design. (present_create_copy): Adjust code to use new async design. (delete_copyout): Likewise. (update_dev_host): Likewise. (gomp_acc_insert_pointer): Add async parameter, adjust code to use new async design. (gomp_acc_remove_pointer): Adjust code to use new async design. * oacc-parallel.c (GOACC_parallel_keyed): Adjust code to use new async design. (GOACC_enter_exit_data): Likewise. (goacc_wait): Likewise. (GOACC_update): Likewise. * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Change to assert fail when called, warn as obsolete in comment. * target.c (goacc_device_copy_async): New function. (gomp_copy_host2dev): Remove 'static', add goacc_asyncqueue parameter, add goacc_device_copy_async case. (gomp_copy_dev2host): Likewise. (gomp_map_vars_existing): Add goacc_asyncqueue parameter, adjust code. (gomp_map_pointer): Likewise. (gomp_map_fields_existing): Likewise. (gomp_map_vars_internal): New always_inline function, renamed from gomp_map_vars. (gomp_map_vars): Implement by calling gomp_map_vars_internal. (gomp_map_vars_async): Implement by calling gomp_map_vars_internal, passing goacc_asyncqueue argument. (gomp_unmap_tgt): Remove static, add attribute_hidden. (gomp_unref_tgt): New function. (gomp_unmap_vars_internal): New always_inline function, renamed from gomp_unmap_vars. (gomp_unmap_vars): Implement by calling gomp_unmap_vars_internal. (gomp_unmap_vars_async): Implement by calling gomp_unmap_vars_internal, passing goacc_asyncqueue argument. (gomp_fini_device): New function. (gomp_exit_data): Adjust gomp_copy_dev2host call. (gomp_load_plugin_for_device): Remove old interface, adjust to load new async interface. (gomp_target_fini): Adjust code to call gomp_fini_device. * plugin/plugin-nvptx.c (struct cuda_map): Remove. (struct ptx_stream): Remove. (struct nvptx_thread): Remove current_stream field. (cuda_map_create): Remove. (cuda_map_destroy): Remove. (map_init): Remove. (map_fini): Remove. (map_pop): Remove. (map_push): Remove. (struct goacc_asyncqueue): Define. (struct nvptx_callback): Define. (struct ptx_free_block): Define. (struct ptx_device): Remove null_stream, active_streams, async_streams, stream_lock, and next fields. (enum ptx_event_type): Remove. (struct ptx_event): Remove. (ptx_event_lock): Remove. (ptx_events): Remove. (init_streams_for_device): Remove. (fini_streams_for_device): Remove. (select_stream_for_async): Remove. (nvptx_init): Remove ptx_events and ptx_event_lock references. (nvptx_attach_host_thread_to_device): Remove CUDA_ERROR_NOT_PERMITTED case. (nvptx_open_device): Add free_blocks initialization, remove init_streams_for_device call. (nvptx_close_device): Remove fini_streams_for_device call, add free_blocks destruct code. (event_gc): Remove. (event_add): Remove. (nvptx_exec): Adjust parameters and code. (nvptx_free): Likewise. (nvptx_host2dev): Remove. (nvptx_dev2host): Remove. (nvptx_set_async): Remove. (nvptx_async_test): Remove. (nvptx_async_test_all): Remove. (nvptx_wait): Remove. (nvptx_wait_async): Remove. (nvptx_wait_all): Remove. (nvptx_wait_all_async): Remove. (nvptx_get_cuda_stream): Remove. (nvptx_set_cuda_stream): Remove. (GOMP_OFFLOAD_alloc): Adjust code. (GOMP_OFFLOAD_free): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove. (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code. (GOMP_OFFLOAD_openacc_async_test_all): Remove. (GOMP_OFFLOAD_openacc_async_wait): Remove. (GOMP_OFFLOAD_openacc_async_wait_async): Remove. (GOMP_OFFLOAD_openacc_async_wait_all): Remove. (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove. (GOMP_OFFLOAD_openacc_async_set_async): Remove. (cuda_free_argmem): New function. (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function. (GOMP_OFFLOAD_openacc_create_thread_data): Adjust code. (GOMP_OFFLOAD_openacc_cuda_get_stream): Adjust code. (GOMP_OFFLOAD_openacc_cuda_set_stream): Adjust code. (GOMP_OFFLOAD_openacc_async_construct): New plugin hook function. (GOMP_OFFLOAD_openacc_async_destruct): New plugin hook function. (GOMP_OFFLOAD_openacc_async_test): Remove and re-implement. (GOMP_OFFLOAD_openacc_async_synchronize): New plugin hook function. (GOMP_OFFLOAD_openacc_async_serialize): New plugin hook function. (GOMP_OFFLOAD_openacc_async_queue_callback): New plugin hook function. (cuda_callback_wrapper): New function. (cuda_memcpy_sanity_check): New function. (GOMP_OFFLOAD_host2dev): Remove and re-implement. (GOMP_OFFLOAD_dev2host): Remove and re-implement. (GOMP_OFFLOAD_openacc_async_host2dev): New plugin hook function. (GOMP_OFFLOAD_openacc_async_dev2host): New plugin hook function. From-SVN: r271128	2019-05-13 13:32:00 +00:00
Thomas Schwinge	0a0384b43a	[libgomp] In OpenACC testing, cycle though all offload targets ... instead of through offload plugins. libgomp/ * plugin/configfrag.ac: Populate and AC_SUBST offload_targets. * testsuite/libgomp-test-support.exp.in: Adjust. * testsuite/lib/libgomp.exp: Likewise. Don't populate openacc_device_types_s. (offload_target_to_openacc_device_type): New proc. * testsuite/libgomp.oacc-c++/c++.exp: Adjust. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise. From-SVN: r269108	2019-02-22 11:51:20 +01:00
Thomas Schwinge	ee332b4a9a	[libgomp] Clarify difference between offload target, offload plugin, and OpenACC device type libgomp/ * plugin/configfrag.ac: Populate and AC_SUBST offload_plugins instead of offload_targets, and AC_DEFINE_UNQUOTED OFFLOAD_PLUGINS instead of OFFLOAD_TARGETS. * target.c (gomp_target_init): Adjust. * testsuite/libgomp-test-support.exp.in: Likewise. * testsuite/lib/libgomp.exp: Likewise. Populate openacc_device_types_s instead of offload_targets_s_openacc. (check_effective_target_openacc_nvidia_accel_selected) (check_effective_target_openacc_host_selected): Adjust. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise. * Makefile.in: Regenerate. * config.h.in: Likewise. * configure: Likewise. * testsuite/Makefile.in: Likewise. From-SVN: r269107	2019-02-22 11:51:05 +01:00
Tom de Vries	738c56d410	[nvptx, libgomp] Fix memleak in GOMP_OFFLOAD_fini_device I wrote a test-case: ... int main (void) { for (unsigned i = 0; i < 128; ++i) { acc_init (acc_device_nvidia); acc_shutdown (acc_device_nvidia); } return 0; } ... and ran it under valgrind. The only leak location reported with a frequency of 128, was the allocation of ptx_devices in nvptx_init. Fix this by freeing ptx_devices in GOMP_OFFLOAD_fini_device, once instantiated_devices drops to 0. 2019-01-24 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (GOMP_OFFLOAD_fini_device): Free ptx_devices once instantiated_devices drops to 0. From-SVN: r268237	2019-01-24 14:12:19 +00:00
Tom de Vries	4a75460b00	[nvptx, libgomp] Fix cuMemAlloc with size zero Consider test-case: ... int main (void) { #pragma acc parallel async ; #pragma acc parallel async ; #pragma acc wait return 0; } ... This fails with: ... libgomp: cuMemAlloc error: invalid argument Segmentation fault (core dumped) ... The cuMemAlloc error is due to the fact that we're try to allocate 0 bytes. Fix this by preventing calling map_push with size zero argument in nvptx_exec. This also has the consequence that for the abort-1.c test-case, we end up calling cuMemFree during map_fini for the struct cuda_map allocated in map_init, which fails because an abort happened. Fix this by calling cuMemFree with CUDA_CALL_NOCHECK in cuda_map_destroy. 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/PR88946 * plugin/plugin-nvptx.c (cuda_map_destroy): Use CUDA_CALL_NOCHECK for cuMemFree. (nvptx_exec): Don't call map_push if mapnum == 0. * testsuite/libgomp.oacc-c-c++-common/pr88946.c: New test. From-SVN: r268178	2019-01-23 08:16:56 +00:00
Tom de Vries	4fef8e4d8c	[nvptx, libgomp] Fix assert (!s->map->active) in map_fini There are currently two situations where this assert triggers: ... libgomp/plugin/plugin-nvptx.c: map_fini: Assertion `!s->map->active' failed. ... First, in abort-1.c, a parallel region triggering an abort: ... int main (void) { #pragma acc parallel abort (); return 0; } ... The abort is detected in nvptx_exec as the CUDA_ERROR_ILLEGAL_INSTRUCTION return status of the cuStreamSynchronize call after kernel launch, which is then handled by calling non-returning function GOMP_PLUGIN_fatal. Consequently, the map_pop in nvptx_exec that in case of cuStreamSynchronize success would remove or inactive the element added by the map_push earlier in nvptx_exec, does not trigger. With the element no longer active, but still marked active and a member of s->map, we run into the assert during GOMP_OFFLOAD_fini_device, which is triggered from atexit handler gomp_target_fini (which is triggered by the GOMP_PLUGIN_fatal mentioned above calling exit). Second, in pr88941.c, an async parallel region without wait: ... int main (void) { #pragma acc parallel async ; /* no #pragma acc wait / return 0; } ... Because nvptx_exec is handling an async region, it does not call map_pop for the element added by map_push, but schedules an kernel execution completion event to call map_pop. Again, we run into the assert during GOMP_OFFLOAD_fini_device, which is triggered from atexit handler gomp_target_fini, but the exit in this case is triggered by returning from main. So either the kernel is still running, or the kernel has completed but the corresponding event that is supposed to call map_pop is stuck in the event queue, waiting for an event_gc. Fix this by removing the assert, and skipping the freeing of device memory if the map is still marked active (though in the async case, this is more a workaround than an fix). 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/88941 PR target/88939 plugin/plugin-nvptx.c (cuda_map_destroy): Handle map->active case. (map_fini): Remove "assert (!s->map->active)". * testsuite/libgomp.oacc-c-c++-common/pr88941.c: New test. From-SVN: r268177	2019-01-23 08:16:42 +00:00
Tom de Vries	2ee6cb22c1	[nvptx, libgomp] Fix map_push The map field of a struct ptx_stream is a FIFO. The FIFO is implemented as a single linked list, with pop-from-the-front semantics. The function map_pop pops an element, either by: - deallocating the element, if there is more than one element - or marking the element inactive, if there's only one element The responsibility of map_push is to push an element to the back, as well as selecting the element to push, by: - allocating an element, or - reusing the element at the front if inactive and big enough, or - dropping the element at the front if inactive and not big enough, and allocating one that's big enough The current implemention gets at least the first and most basic scenario wrong: > map = cuda_map_create (size); We create an element, and assign it to map. > for (t = s->map; t->next != NULL; t = t->next) > ; We determine the last element in the fifo. > t->next = map; We append the new element. > s->map = map; But here, we throw away the rest of the FIFO, and declare the FIFO to be just the new element. This problem causes the test-case asyncwait-1.c to fail intermittently on some systems. The pr87835.c test-case added here is a a minimized and modified version of asyncwait-1.c (avoiding the kernel construct) that is more likely to fail. Fix this by rewriting map_pop more robustly, by: - seperating the function in two phases: select element, push element - when reusing or dropping an element, making sure that the element is cleanly popped from the queue - rewriting the push element part in such a way that it can handle all cases without needing if statements, such that each line is exercised for each of the three cases. 2019-01-23 Tom de Vries <tdevries@suse.de> PR target/87835 * plugin/plugin-nvptx.c (map_push): Fix adding of allocated element. * testsuite/libgomp.oacc-c-c++-common/pr87835.c: New test. From-SVN: r268176	2019-01-23 08:16:11 +00:00
Tom de Vries	2c2ff1684d	[nvptx] Enable setting vector length using -fopenacc-dim Enable setting vector length using -fopenacc-dim, f.i. -fopenacc-dim=::128. 2019-01-12 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.c (nvptx_goacc_validate_dims_1): Alow setting vector length using -fopenacc-dim. * plugin/plugin-nvptx.c (nvptx_exec): Update error message. From-SVN: r267896	2019-01-12 22:19:15 +00:00
Tom de Vries	52d22ece49	[nvptx] Update insufficient launch message for variable vector_length Update message in nvptx libgomp plugin about insufficient resources to launch kernel, to accommodate for the fact the vector_length can now be variable. 2019-01-12 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Update insufficient hardware resources diagnostic. From-SVN: r267890	2019-01-12 22:18:00 +00:00
Tom de Vries	052aaaceed	[nvptx] Don't allow vector_length 64 with num_workers 16 When using a compiler build with: ... +#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE ... consider a test-case: ... int main (void) { #pragma acc parallel vector_length (64) #pragma acc loop worker for (unsigned int i = 0; i < 32; i++) #pragma acc loop vector for (unsigned int j = 0; j < 64; j++) ; return 0; } ... If num_workers is 16, either because: - we add a "num_workers (16)" clause on the parallel directive, or - we set "GOMP_OPENACC_DIM=:16:", or - the libgomp plugin chooses 16 num_workers we run into an illegal instruction at runtime, because a bar.sync instruction tries to use a barrier 16. The instruction is illegal, because ptx supports only 16 barriers per CTA, and the valid range is 0..15. The problem is that with a warp-multiple vector length, we use a code generation scheme with a per-worker barrier. And because barrier zero is reserved for per-cta barrier, only the remaining 15 barriers can be used as per-worker barrier, and consequently we can't use num_workers larger than 15. This problem occurs only for vector_length 64. For vector_length 32, we use a different code generation scheme, and for vector_length >= 96, the maximum num_workers is not big enough not to trigger this problem. Also, this problem only occurs for num_workers 16. As explained above, num_workers 15 is safe to use, and 16 is already the maximum num_workers for vector_length 64. This patch fixes the problem in both the compiler (handling "num_workers (16)") and in the libgomp nvptx plugin (with and without "GOMP_OPENACC_DIM=:16:"). 2019-01-11 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.c (PTX_CTA_NUM_BARRIERS, PTX_PER_CTA_BARRIER) (PTX_NUM_PER_CTA_BARRIER, PTX_FIRST_PER_WORKER_BARRIER) (PTX_NUM_PER_WORKER_BARRIERS): Define. (nvptx_apply_dim_limits): Prevent vector_length 64 and num_workers 16. * plugin/plugin-nvptx.c (nvptx_exec): Prevent vector_length 64 and num_workers 16. From-SVN: r267838	2019-01-11 11:46:43 +00:00
Tom de Vries	2c372e81a9	[nvptx, libgomp] Don't launch with num_workers == 0 When using a compiler build with: ... +#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE +#define PTX_MAX_VECTOR_LENGTH PTX_CTA_SIZE ... and running the libgomp testsuite, we run into an execution failure in parallel-loop-1.c, due to a cuda launch failure: ... nvptx_exec: kernel f6_none_none$_omp_fn$0: launch gangs=480, workers=0, \ vectors=1024 libgomp: cuLaunchKernel error: invalid argument ... because workers == 0. The workers variable is set to 0 here in nvptx_exec: ... workers = blocks / actual_vectors; ... because actual_vectors is 1024, and blocks is 768: ... cuOccupancyMaxPotentialBlockSize: grid = 10, block = 768 ... Fix this by ensuring that workers is at least one. 2019-01-09 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (nvptx_exec): Make sure to launch with at least one worker. From-SVN: r267746	2019-01-09 00:07:45 +00:00
Jakub Jelinek	a554497024	Update copyright years. From-SVN: r267494	2019-01-01 13:31:55 +01:00
Thomas Schwinge	f847198ec3	[PR88495] An OpenACC async queue is always synchronized with itself An OpenACC async queue is always synchronized with itself, so invocations like "#pragma acc wait(0) async(0)", or "acc_wait_async (0, 0)" don't make a lot of sense, but are still valid. libgomp/ PR libgomp/88495 * plugin/plugin-nvptx.c (nvptx_wait_async): Don't refuse "identical parameters". * testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/lib-80.c: Remove. From-SVN: r267152	2018-12-14 21:43:02 +01:00
Thomas Schwinge	1404af62dc	[PR88407] [OpenACC] Correctly handle unseen async-arguments ... which turn the operation into a no-op. libgomp/ PR libgomp/88407 * plugin/plugin-nvptx.c (nvptx_async_test, nvptx_wait) (nvptx_wait_async): Unseen async-argument is a no-op. * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise. * testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-71.c: Merge into... * testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this. Update. * testsuite/libgomp.oacc-c-c++-common/lib-77.c: Merge into... * testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this. Update From-SVN: r267150	2018-12-14 21:42:40 +01:00
Thomas Schwinge	18c247cc0b	[PR88370] acc_get_cuda_stream/acc_set_cuda_stream: acc_async_sync, acc_async_noval Per my reading of the OpenACC specification (and as supported by secondary documentation, such as code examples, or presentations), it's valid to call "acc_get_cuda_stream"/"acc_set_cuda_stream" also with "acc_async_sync", "acc_async_noval" arguments, not just with the nonnegative values as currently implemented. libgomp/ PR libgomp/88370 * libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream) (acc_set_cuda_stream): Clarify. * oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use "async_valid_p". * plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async == acc_async_sync". * testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file. * testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update. * testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise. From-SVN: r267147	2018-12-14 21:42:08 +01:00
Cesar Philippidis	2049befdd0	[nvptx] Remove use of CUDA unified memory in libgomp libgomp/ * plugin/plugin-nvptx.c (struct cuda_map): New. (struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev, h_tail with (cuda_map *) map. (cuda_map_create): New function. (cuda_map_destroy): New function. (map_init): Update to use a linked list of cuda_map objects. (map_fini): Likewise. (map_pop): Likewise. (map_push): Likewise. Return CUdeviceptr instead of void. (init_streams_for_device): Remove stales references to ptx_stream members. (select_stream_for_async): Likewise. (nvptx_exec): Update call to map_init. From-SVN: r264397	2018-09-18 08:41:54 -07:00
Cesar Philippidis	bd9b3d3d1a	[nvptx] Use CUDA driver API to select default runtime launch geometry The CUDA driver API starting version 6.5 offers a set of runtime functions to calculate several occupancy-related measures, as a replacement for the occupancy calculator spreadsheet. This patch adds a heuristic for default runtime launch geometry, based on the new runtime function cuOccupancyMaxPotentialBlockSize. Build on x86_64 with nvptx accelerator and ran libgomp testsuite. 2018-08-13 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> PR target/85590 * plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef. (cuOccupancyMaxPotentialBlockSize): Declare. * plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define CUoccupancyB2DSize and declare cuOccupancyMaxPotentialBlockSize. (nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the default num_gangs and num_workers when the driver supports it. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r263505	2018-08-13 12:04:24 +00:00
Tom de Vries	8e09a12f01	[libgomp, nvptx] Fall back to cuLinkAddData/cuLinkCreate if _v2 not found Cuda driver api functions cuLinkAddData and cuLinkCreate are available starting version 5.5. In version 6.5, they are remapped onto _v2 versions. The dlopen interface of the libgomp nvptx plugin uses the _v2 versions, so it won't work with a cuda driver with driver api version lower than 6.5. This patch fixes the problem by testing for the presence of the _v2 versions, and falling back to the original versions in case of absence of the _v2 versions. Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and without --without-cuda-driver. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def (cuLinkAddData_v2, cuLinkCreate_v2): Declare using CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (cuLinkAddData, cuLinkCreate): Undef and declare. (cuLinkAddData_v2, cuLinkCreate_v2): Declare. (link_ptx): Fall back to cuLinkAddData/cuLinkCreate if the _v2 versions are not found. From-SVN: r263408	2018-08-08 14:26:37 +00:00
Tom de Vries	cedd9bd016	[libgomp, nvptx] Allow cuGetErrorString to be NULL Cuda driver api function cuGetErrorString is available in version 6.0 and higher. Currently, when the driver that is used does not contain this function, the libgomp nvptx plugin will not build (PLUGIN_NVPTX_DYNAMIC == 0) or run (PLUGIN_NVPTX_DYNAMIC == 1). This patch fixes this problem by testing for the presence of the function, and handling absence. Build on x86_64 with nvptx accelerator and reg-tested libgomp, both with and without --without-cuda-driver. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def (cuGetErrorString): Use CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (cuda_error): Handle if cuGetErrorString is not present. From-SVN: r263407	2018-08-08 14:26:28 +00:00
Tom de Vries	b113af959c	[libgomp, nvptx] Remove hard-coded const in nvptx_open_device CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR is defined in cuda driver api version 6.0 and higher. Currently nvptx_open_device uses a hard-coded constant instead. This patch fixes that by: - defining CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR to the hardcoded constant at toplevel, if not present in cuda.h, and - using CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR in nvptx_open_device Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR): Define. (nvptx_open_device): Use CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR. From-SVN: r263406	2018-08-08 14:26:19 +00:00
Tom de Vries	94767dacea	[libgomp, nvptx] Note that cuGetErrorString is in CUDA_VERSION >= 6000 Cuda driver api function cuGetErrorString is available in version 6.0 and higher. This patch: - removes a comment saying the declaration is not available in cuda.h 6.0 - fixes the presence test to use CUDA_VERSION < 6000 - moves the declaration to toplevel Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-08 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (cuda_error): Move declaration of cuGetErrorString ... (cuGetErrorString): ... here. Guard with CUDA_VERSION < 6000. From-SVN: r263405	2018-08-08 14:26:10 +00:00
Tom de Vries	02150de863	[libgomp, nvptx] Handle CUDA_ONE_CALL_MAYBE_NULL This patch adds handling of functions that may not be present in the cuda driver. Such a function can be declared using CUDA_ONE_CALL_MAYBE_NULL in cuda-lib.def, it can be called with the usual convenience macros, but before calling its presence needs to be tested using new macro CUDA_CALL_EXISTS. When using the dlopen interface (PLUGIN_NVPTX_DYNAMIC == 1), we allow non-present functions by allowing dlsym to return NULL. Otherwise (PLUGIN_NVPTX_DYNAMIC == 0) we declare the non-present function to be weak. Build and reg-tested libgomp on x86_64 with nvidia accelerator, with and without --disable-cuda-driver, in combination with a trigger patch that adds a non-existing function foo to cuda-lib.def: ... CUDA_ONE_CALL_MAYBE_NULL (foo) ... and declares it in plugin-nvptx.c: ... CUresult foo (void); ... and then uses it in nvptx_init after the init_cuda_lib call: ... if (CUDA_CALL_EXISTS (foo)) CUDA_CALL (foo); ... Also build and reg-tested on x86_64 with nvidia accelerator, with and without --disable-cuda-driver, in combination with a trigger patch that replaces all CUDA_ONE_CALLs in cuda-lib.def with CUDA_ONE_CALL_MAYBE_NULL, and guards two CUDA_CALLs with CUDA_CALL_EXISTS, one for a regular fn, and one for a fn that is a define in cuda/cuda.h. 2018-08-07 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (DO_PRAGMA): Define. (struct cuda_lib_s): Add def/undef of CUDA_ONE_CALL_MAYBE_NULL. (init_cuda_lib): Add new param to CUDA_ONE_CALL_1. Add arg to corresponding call in CUDA_ONE_CALL. Add def/undef of CUDA_ONE_CALL_MAYBE_NULL. (CUDA_CALL_EXISTS): Define. From-SVN: r263346	2018-08-06 22:13:56 +00:00
Tom de Vries	9e28b10779	[libgomp, nvptx] Minimize lifetime of CUDA_ONE_CALL defines This patch makes sure that the lifetimes of the CUDA_ONE_CALL macro (which is defined twice in plugin-nvptx.c) are minimized, to make it obvious that the definitions are used only in the lib-cuda.def include. Build on x86_64 with nvptx accelerator and reg-tested libgomp. 2018-08-07 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (struct cuda_lib_s, init_cuda_lib): Put CUDA_ONE_CALL defines right before the cuda-lib.def include, and the corresponding undefs right after. From-SVN: r263345	2018-08-06 22:13:46 +00:00
Tom de Vries	099400909e	[libgomp, nvptx, --without-cuda-driver] Don't use system cuda driver Using libgomp configure option --with-cuda-driver=<dir> we can indicate what cuda driver to use to build the libgomp nvptx plugin. Without such an option, the system cuda driver is used, if available. If not availabe, a dlopen interface is used instead. However, when we use --without-cuda-driver (or the equivalent --with-cuda-driver=no) the system cuda driver is still used if available. This patch fixes that, making sure that --without-cuda-driver selects the dlopen interface. Build on x86_64 with nvptx accelerator and tested libgomp testsuite, with and without option --without-cuda-driver. 2018-08-04 Tom de Vries <tdevries@suse.de> * plugin/configfrag.ac: For --without-cuda-driver, set CUDA_DRIVER_INCLUDE and CUDA_DRIVER_LIB to no. Handle CUDA_DRIVER_INCLUDE == no and CUDA_DRIVER_LIB == no. * configure: Regenerate. From-SVN: r263310	2018-08-04 20:07:22 +00:00
Cesar Philippidis	094db6beb9	[PATCH] Remove use of 'struct map' from plugin (nvptx) libgomp/ * plugin/plugin-nvptx.c (struct map): Removed. (map_init, map_pop): Remove use of struct map. (map_push): Likewise and change argument list. * testsuite/libgomp.oacc-c-c++-common/mapping-1.c: New Co-Authored-By: James Norris <jnorris@codesourcery.com> From-SVN: r263212	2018-08-01 07:09:56 -07:00
Tom de Vries	8c6310a2c2	[libgomp, nvptx] Add cuda-lib.def 2018-08-01 Tom de Vries <tdevries@suse.de> * plugin/cuda-lib.def: New file. Factor out of ... * plugin/plugin-nvptx.c (CUDA_CALLS): ... here. (struct cuda_lib_s, init_cuda_lib): Include cuda-lib.def instead of using CUDA_CALLS. From-SVN: r263208	2018-08-01 13:20:22 +00:00
Tom de Vries	4cdfee3f20	[libgomp, nvptx] Handle per-function max-threads-per-block in default dims Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the kernel launch configurations exceeds the resources available in the device, due to the default dimensions chosen by the runtime. This patch fixes that by taking the per-function max_threads_per_block into account when using the default dimensions. 2018-07-30 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (MIN, MAX): Redefine. (nvptx_exec): Ensure worker and vector default dims don't exceed targ_fn->max_threads_per_block. From-SVN: r263062	2018-07-30 08:17:26 +00:00
Tom de Vries	0b210c43bb	[libgomp, nvptx] Calculate default dims per device The default dimensions are calculated using per-device properties, but initialized once and used on all devices. This patch fixes this problem by introducing per-device default dimensions. 2018-07-30 Tom de Vries <tdevries@suse.de> * plugin/plugin-nvptx.c (struct ptx_device): Add default_dims field. (nvptx_open_device): Init default_dims for device. (nvptx_exec): Use default_dims from device. From-SVN: r263061	2018-07-30 08:17:16 +00:00

1 2 3

104 Commits