OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Jakub Jelinek	2c16eb3157	openmp: Add support for inoutset depend-kind This patch adds support for inoutset depend-kind in depend clauses. It is very similar to the in depend-kind in that a task with a dependency with that depend-kind is dependent on all previously created sibling tasks with matching address unless they have the same depend-kind. In the in depend-kind case everything is dependent except for in -> in dependency, for inoutset everything is dependent except for inoutset -> inoutset dependency. mutexinoutset is also similar (everything is dependent except for mutexinoutset -> mutexinoutset dependency), but there is also the additional restriction that only one task with mutexinoutset for each address can be scheduled at once (i.e. mutual exclusitivty). For now we support mutexinoutset the same as inout/out, but the inoutset support is full. In order not to bump the ABI for dependencies each time (we've bumped it already once, the old ABI supports only inout/out and in depend-kind, the new ABI supports inout/out, mutexinoutset, in and depobj), this patch arranges for inoutset to be at least for the time being always handled as if it was specified through depobj even when it is not. So it uses the new ABI for that and inoutset are represented like depobj - pointer to a pair of pointers where the first one will be the actual address of the object mentioned in depend clause and second pointer will be (void ) GOMP_DEPEND_INOUTSET. 2022-05-17 Jakub Jelinek <jakub@redhat.com> gcc/ tree-core.h (enum omp_clause_depend_kind): Add OMP_CLAUSE_DEPEND_INOUTSET. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_DEPEND_INOUTSET. * gimplify.cc (gimplify_omp_depend): Likewise. * omp-low.cc (lower_depend_clauses): Likewise. gcc/c-family/ * c-omp.cc (c_finish_omp_depobj): Handle OMP_CLAUSE_DEPEND_INOUTSET. gcc/c/ * c-parser.cc (c_parser_omp_clause_depend): Parse inoutset depend-kind. (c_parser_omp_depobj): Likewise. gcc/cp/ * parser.cc (cp_parser_omp_clause_depend): Parse inoutset depend-kind. (cp_parser_omp_depobj): Likewise. * cxx-pretty-print.cc (cxx_pretty_printer::statement): Handle OMP_CLAUSE_DEPEND_INOUTSET. gcc/testsuite/ * c-c++-common/gomp/all-memory-1.c (boo): Add test with inoutset depend-kind. * c-c++-common/gomp/all-memory-2.c (boo): Likewise. * c-c++-common/gomp/depobj-1.c (f1): Likewise. (f2): Adjusted expected diagnostics. * g++.dg/gomp/depobj-1.C (f4): Adjust expected diagnostics. include/ * gomp-constants.h (GOMP_DEPEND_INOUTSET): Define. libgomp/ * libgomp.h (struct gomp_task_depend_entry): Change is_in type from bool to unsigned char. * task.c (gomp_task_handle_depend): Handle GOMP_DEPEND_INOUTSET. Ignore dependencies where task->depend[i].is_in && task->depend[i].is_in == ent->is_in rather than just task->depend[i].is_in && ent->is_in. Remember whether GOMP_DEPEND_IN loop is needed and guard the loop with that conditional. (gomp_task_maybe_wait_for_dependencies): Handle GOMP_DEPEND_INOUTSET. Ignore dependencies where elem.is_in && elem.is_in == ent->is_in rather than just elem.is_in && ent->is_in. * testsuite/libgomp.c-c++-common/depend-1.c (test): Add task with inoutset depend-kind. * testsuite/libgomp.c-c++-common/depend-2.c (test): Likewise. * testsuite/libgomp.c-c++-common/depend-3.c (test): Likewise. * testsuite/libgomp.c-c++-common/depend-inoutset-1.c: New test.	2022-05-17 15:40:27 +02:00
Tobias Burnus	4f94c38a92	OpenMP: Add omp_all_memory support to Fortran Fortran part to the C/C++/backend implementation r13-337-g7f78783dbedca0183d193e475262ca3c489fd365 gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist): Handle omp_all_memory. * openmp.cc (gfc_match_omp_variable_list, gfc_match_omp_depend_sink, gfc_match_omp_clauses, resolve_omp_clauses): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj): Likewise. * resolve.cc (resolve_symbol): Reject it as symbol. libgomp/ChangeLog: * libgomp.texi (OpenMP 5.1): Set omp_all_memory to 'Y'. * testsuite/libgomp.fortran/depend-5.f90: New test. * testsuite/libgomp.fortran/depend-6.f90: New test. * testsuite/libgomp.fortran/depend-7.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/all-memory-1.f90: New test. * gfortran.dg/gomp/all-memory-2.f90: New test. * gfortran.dg/gomp/all-memory-3.f90: New test.	2022-05-17 11:01:04 +02:00
Marcel Vollweiler	b4fb9f4f9a	OpenMP, C++: Add template support for the has_device_addr clause. This patch adds support for list items in the has_device_addr clause which type is given by C++ template parameters. gcc/cp/ChangeLog: * pt.cc (tsubst_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR. * semantics.cc (finish_omp_clauses): Added template decl processing. libgomp/ChangeLog: * testsuite/libgomp.c++/target-has-device-addr-7.C: New test. * testsuite/libgomp.c++/target-has-device-addr-8.C: New test. * testsuite/libgomp.c++/target-has-device-addr-9.C: New test.	2022-05-16 01:02:50 -07:00
Tobias Burnus	70d624ff06	libgomp.fortran/target-nowait-array-section.f90: Fix typo Fix typo as requested in the review approval. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.	2022-05-13 20:04:38 +02:00
Tobias Burnus	a46d626837	OpenMP/Fortran: Use firstprivat not alloc for ptr attach for arrays For a non-descriptor array, map(A(n:m)) was mapped as map(tofrom:A[n-1] [len: ...]) map(alloc:A [pointer assign, bias: ...]) with this patch, it is changed to map(tofrom:A[n-1] [len: ...]) map(firstprivate:A [pointer assign, bias: ...]) The latter avoids an alloc - and also avoids the race condition with nowait in the enclosed testcase. (Note: predantically, the testcase is invalid since OpenMP 5.1, violating the map clause restriction at [354:10-13]. gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of GOMP_MAP_POINTER for the pointer attachment. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.	2022-05-13 20:00:34 +02:00
Thomas Schwinge	dcc266796a	Refactor '-ldl' handling for libgomp proper and plugins Instead of implicit global 'LIBS="-ldl $LIBS"' via 'AC_CHECK_LIB', make '-ldl' explicit for libgomp proper, and clean up 'PLUGIN_GCN_LIBS', 'PLUGIN_NVPTX_LIBS' accordingly. libgomp/ * Makefile.am (libgomp_la_LIBADD): Initialize. * plugin/configfrag.ac (DL_LIBS): New. (PLUGIN_GCN_LIBS): Remove. (PLUGIN_NVPTX_LIBS): Don't set in the 'PLUGIN_NVPTX_DYNAMIC' case. * plugin/Makefrag.am (libgomp_la_LIBADD) (libgomp_plugin_gcn_la_LIBADD): Consider '$(DL_LIBS)'. (libgomp_plugin_nvptx_la_LIBADD) <PLUGIN_NVPTX_DYNAMIC>: Likewise. * Makefile.in: Regenerate. * config.h.in: Likewise. * configure: Likewise. * testsuite/Makefile.in: Likewise.	2022-05-12 15:11:30 +02:00
Thomas Schwinge	edbd2b1caa	libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX' Nothing ever used these. libgomp/ * plugin/configfrag.ac: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX'. * Makefile.in: Regenerate. * config.h.in: Likewise. * configure: Likewise. * testsuite/Makefile.in: Likewise.	2022-05-12 13:28:23 +02:00
Jakub Jelinek	7f78783dbe	openmp: Add omp_all_memory support (C/C++ only so far) The ugly part is that OpenMP 5.1 made omp_all_memory a reserved identifier which isn't allowed to be used anywhere but in the depend clause, this is against how everything else has been handled in OpenMP so far (where some identifiers could have special meaning in some OpenMP clauses or pragmas but not elsewhere). The patch handles it by making it a conditional keyword (for -fopenmp only) and emitting a better diagnostics when it is used in a primary expression. Having a nicer diagnostics when e.g. trying to do int omp_all_memory; or int omp_all_memory[10]; etc. would mean changing too many spots and hooking into name lookups to reject declaring any such symbols would be too ugly and I'm afraid there are way too many spots where one can introduce a name (variables, functions, namespaces, struct, enum, enumerators, template arguments, ...). Otherwise, the handling is quite simple, normal depend clauses lower into addresses of variables being handed over to the library, for omp_all_memory I'm using NULL pointers. omp_all_memory can only be used with inout or out depend kinds and means that a task is dependent on all previously created sibling tasks that have any dependency (of any depend kind) and that any later created sibling tasks will be dependent on it if they have any dependency. 2022-05-12 Jakub Jelinek <jakub@redhat.com> gcc/ gimplify.cc (gimplify_omp_depend): Don't build_fold_addr_expr if null_pointer_node. (gimplify_scan_omp_clauses): Likewise. * tree-pretty-print.cc (dump_omp_clause): Print null_pointer_node as omp_all_memory. gcc/c-family/ * c-common.h (enum rid): Add RID_OMP_ALL_MEMORY. * c-omp.cc (c_finish_omp_depobj): Don't build_fold_addr_expr if null_pointer_node. gcc/c/ * c-parser.cc (c_parse_init): Register omp_all_memory as keyword if flag_openmp. (c_parser_postfix_expression): Diagnose uses of omp_all_memory in postfix expressions. (c_parser_omp_variable_list): Handle omp_all_memory in depend clause. * c-typeck.cc (c_finish_omp_clauses): Handle omp_all_memory keyword in depend clause as null_pointer_node, diagnose invalid uses. gcc/cp/ * lex.cc (init_reswords): Register omp_all_memory as keyword if flag_openmp. * parser.cc (cp_parser_primary_expression): Diagnose uses of omp_all_memory in postfix expressions. (cp_parser_omp_var_list_no_open): Handle omp_all_memory in depend clause. * semantics.cc (finish_omp_clauses): Handle omp_all_memory keyword in depend clause as null_pointer_node, diagnose invalid uses. * pt.cc (tsubst_omp_clause_decl): Pass through omp_all_memory. gcc/testsuite/ * c-c++-common/gomp/all-memory-1.c: New test. * c-c++-common/gomp/all-memory-2.c: New test. * c-c++-common/gomp/all-memory-3.c: New test. * g++.dg/gomp/all-memory-1.C: New test. * g++.dg/gomp/all-memory-2.C: New test. libgomp/ * libgomp.h (struct gomp_task): Add depend_all_memory member. * task.c (gomp_init_task): Initialize depend_all_memory. (gomp_task_handle_depend): Handle omp_all_memory. (gomp_task_run_post_handle_depend_hash): Clear parent->depend_all_memory if equal to current task. (gomp_task_maybe_wait_for_dependencies): Handle omp_all_memory. * testsuite/libgomp.c-c++-common/depend-1.c: New test. * testsuite/libgomp.c-c++-common/depend-2.c: New test. * testsuite/libgomp.c-c++-common/depend-3.c: New test.	2022-05-12 08:31:20 +02:00
Thomas Schwinge	876ac21b7e	libgomp: Remove unused '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib' With recent commit `2e309a4eff` "libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library", and commit `d6adba3075` "libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library", the last uses of '--with-hsa-runtime' etc. are gone. gcc/ * doc/install.texi: Don't document '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib'. libgomp/ * plugin/configfrag.ac: Remove '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib' processing. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise.	2022-05-11 14:27:42 +02:00
Thomas Schwinge	91a6dcd149	libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS' After recent commit `d6adba3075` "libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library", these aren't set anymore. libgomp/ * plugin/Makefrag.am (libgomp_plugin_gcn_la_CPPFLAGS): Don't consider 'PLUGIN_GCN_CPPFLAGS'. (libgomp_plugin_gcn_la_LDFLAGS): Don't consider 'PLUGIN_GCN_LDFLAGS'. * plugin/configfrag.ac (PLUGIN_GCN_CPPFLAGS, PLUGIN_GCN_LDFLAGS): Remove. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise.	2022-05-11 14:25:58 +02:00
Thomas Schwinge	2e309a4eff	libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or '--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can tell. 'libgomp/testsuite/lib/libgomp.exp:libgomp_init' states: # For build-tree testing, also consider the library paths used for builing. # For installed testing, we assume all that to be provided in the sysroot. if { $blddir != "" } { [...] global hsa_runtime_lib if { $hsa_runtime_lib != "" } { append always_ld_library_path ":$hsa_runtime_lib" } } However, the libgomp GCN plugin is unconditionally built against the GCC-shipped 'include/hsa.h' header files, and at run time does 'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime library "used for builing". It thus doesn't make sense to amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library. libgomp/ testsuite/lib/libgomp.exp (libgomp_init): Don't 'append always_ld_library_path ":$hsa_runtime_lib"'. * testsuite/libgomp-test-support.exp.in (hsa_runtime_lib): Don't set.	2022-05-11 14:24:21 +02:00
Thomas Schwinge	7981524755	Fix up 'libgomp.fortran/use_device_addr-5.f90' multi-device testing Fix-up for recent commit r13-116-g3f8c389fe90bf565a6221a46bb7fb745dd4c1510 "OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg", where we currently get: libgomp: use_device_ptr pointer wasn't mapped FAIL: libgomp.fortran/use_device_addr-5.f90 -O execution test libgomp/ * testsuite/libgomp.fortran/use_device_addr-5.f90: Fix up multi-device testing.	2022-05-10 14:48:11 +02:00
Marcel Vollweiler	4043f53cb4	OpenMP, libgomp: Add new runtime routine omp_target_is_accessible. gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added target_is_accessible to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_target_is_accessible. * libgomp.texi: Tagged omp_target_is_accessible as supported. * omp.h.in: Added omp_target_is_accessible. * omp_lib.f90.in: Added interface for omp_target_is_accessible. * omp_lib.h.in: Likewise. * target.c (omp_target_is_accessible): Added implementation of omp_target_is_accessible. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c: New test. * testsuite/libgomp.fortran/target-is-accessible-1.f90: New test.	2022-05-06 07:28:26 -07:00
Tobias Burnus	3f8c389fe9	OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg For array-descriptor vars, the descriptor is assigned to a temporary. However, this failed when the clause's argument was in turn in a data-sharing clause as the outer context's VALUE_EXPR wasn't used. gcc/ChangeLog: * omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list item that is in an outer data-sharing clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_addr-5.f90: New test.	2022-05-04 18:18:44 +02:00
Marcel Vollweiler	941cdc8b6d	OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr. This patch adds the OpenMP runtime routine "omp_get_mapped_ptr" which was introduced in OpenMP 5.1. gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added get_mapped_ptr to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_get_mapped_ptr. * libgomp.texi: Tagged omp_get_mapped_ptr as supported. * omp.h.in: Added omp_get_mapped_ptr. * omp_lib.f90.in: Added interface for omp_get_mapped_ptr. * omp_lib.h.in: Likewise. * target.c (omp_get_mapped_ptr): Added implementation of omp_get_mapped_ptr. * testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test. * testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.	2022-05-02 23:56:44 -07:00
Thomas Schwinge	2a570f11a2	Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] That got broken by recent commit `b220243191` "fortran: Fix up gfc_trans_oacc_construct [PR104717]". PR fortran/104717 libgomp/ * testsuite/libgomp.oacc-fortran/print-1.f90: Add OpenACC privatization scanning. For GCN offloading compilation, raise '-mgang-private-size'.	2022-04-28 15:15:29 +02:00
Jakub Jelinek	b220243191	fortran: Fix up gfc_trans_oacc_construct [PR104717] So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs for which that function is invoked need an extra artificial BIND_EXPR around their body so that we move all variables of the bodies. The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK or OMP_TARGET and for OpenACC constructs that behave similarly to OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. The following patch does that for OpenACC constructs too. PR fortran/104717 gcc/fortran/ * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body in an extra BIND_EXPR. gcc/testsuite/ * gfortran.dg/goacc/pr104717.f90: New test. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>	2022-04-25 23:14:02 +02:00
Chung-Lin Tang	b0af8e3a50	OpenMP: Fix nested use_device_ptr This patch fixes a bug in lower_omp_target, where for Fortran arrays, the expanded sender assignment is wrongly using the variable in the current ctx, instead of the one looked-up outside, which is causing use_device_ptr/addr to fail to work when used inside an omp-parallel (where the omp child_fn is split away from the original). The fix is inside omp-low.cc, though because the omp_array_data langhook is used only by Fortran, this is essentially Fortran-specific. 2022-04-05 Chung-Lin Tang <cltang@codesourcery.com> gcc/ChangeLog: * omp-low.cc (lower_omp_target): Use outer context looked-up 'var' as argument to lang_hooks.decls.omp_array_data, instead of 'ovar' from current clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_ptr-4.f90: New testcase.	2022-04-05 08:31:34 -07:00
Tom de Vries	88cffa1a07	[libgomp/testsuite] Fix libgomp.fortran/examples-4/declare_target-{1,2}.f90 The test-cases libgomp.fortran/examples-4/declare_target-{1,2}.f90 mean to set an nvptx-specific limit using offload_target_nvptx, but also change behaviour for amd. That is, there is now a difference in behaviour between: - a compiler configured for GCN offloading, and - a compiler configured for both GCN and nvptx offloading. Fix this by using instead on_device_arch_nvptx. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-04 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Use on_device_arch_nvptx instead of offload_target_nvptx. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.	2022-04-04 13:37:19 +02:00
Tom de Vries	bfa9f660d2	[libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90 When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run into: ... FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \ -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \ -DGOMP_NVPTX_JIT=-O0 execution test ... Fix this by further limiting recursion depth in the test-cases for nvptx. Furthermore, make the recursion depth limiting nvptx-specific. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-01 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Define and use REC_DEPTH. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.	2022-04-01 13:23:16 +02:00
Tom de Vries	065e25f633	[libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.c When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an RTX A2000 (sm_86) with driver 510.60.02 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ output pattern test ... The failing check verifies the launch dimensions: ... /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: \ launch gangs=1, workers=8, vectors=128" } / ... which fails because (as we can see with GOMP_DEBUG=1) the actual num_workers is 6: ... nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128 ... This is due to the result of cuOccupancyMaxPotentialBlockSize (which suggests 'a launch configuration with reasonable occupancy') printed just before: ... cuOccupancyMaxPotentialBlockSize: grid = 52, block = 768 ... [ Note: 6 128 == 768. ] Fix this by updating the check to allow num_workers in the range 1 to 8. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-01 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Fix num_workers check.	2022-04-01 13:22:07 +02:00
Tom de Vries	8570cce7c7	[libgomp, testsuite] Scale down some OpenACC test-cases When a display manager is running on an nvidia card, all CUDA kernel launches get a 5 seconds watchdog timer. Consequently, when running the libgomp testsuite with nvptx accelerator and GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this: ... libgomp: cuStreamSynchronize error: the launch timed out and was terminated FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ execution test ... Fix this by scaling down the failing test-cases by default, and reverting to the original behaviour for GCC_TEST_RUN_EXPENSIVE=1. Tested on x86_64-linux with nvptx accelerator. libgomp/ChangeLog: 2022-03-25 Tom de Vries <tdevries@suse.de> PR libgomp/105042 * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce execution time. * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.	2022-03-25 13:51:48 +01:00
Tobias Burnus	1002a7ace1	LTO: Fixes for renaming issues with offload/OpenMP [PR104285] gcc/lto/ChangeLog: PR middle-end/104285 * lto-partition.cc (maybe_rewrite_identifier): Use get_identifier for the returned string to be usable as hash key. (validize_symbol_for_target): Hence, use return value directly. (privatize_symbol_name_1): Track maybe_rewrite_identifier renames. * lto.cc (offload_handle_link_vars): Move function up before ... (do_whole_program_analysis): Call it after static renamings. (lto_main): Move call after static renamings. libgomp/ChangeLog: PR middle-end/104285 * testsuite/libgomp.c++/target-same-name-2-a.C: New test. * testsuite/libgomp.c++/target-same-name-2-b.C: New test. * testsuite/libgomp.c++/target-same-name-2.C: New test. * testsuite/libgomp.c-c++-common/target-same-name-1-a.c: New test. * testsuite/libgomp.c-c++-common/target-same-name-1-b.c: New test. * testsuite/libgomp.c-c++-common/target-same-name-1.c: New test.	2022-03-23 09:44:39 +01:00
Tom de Vries	a624388b95	[nvptx] Add warp sync at simt exit Consider this code (with N defined to 1024): ... float v = 0.0; #pragma omp target map(tofrom: v) #pragma omp parallel for simd for (int i = 0 ; i < N; i++) { #pragma omp atomic update v = v + 1.0; } ... It hangs when executing on target board unix/-foffload=-misa=sm_75, using drivers 470.103.01 and 510.54 on a T400 board (sm_75). I'm tentatively identifying the problem as a bug in -muniform-simt for architectures that support Independent Thread Scheduling (sm_70 and later). The problem -muniform-simt is trying to address is to make sure that a register produced outside an openmp simd region is available when used in any lane inside an simd region. The solution is to, outside an simd region, execute in all warp lanes, thus producing consistent values in result registers in each warp thread. This approach doesn't work when executing in all warp lanes multiplies the side effects from 1 to 32 separate side effects, which is the case for atomic insns. So atomic insns are rewritten to execute only in lane 0, and if there are any results, those are propagated to the other threads in the warp. [ And likewise for system calls malloc, free, vprintf. ] Now, consider a non-atomic update: ld, add, store. The store has side effects, are those multiplied or not? Pre-sm_70 we can assume that at the end of an SIMT region, any divergent control flow has reconverged, and we have a uniform warp, executing in lock step. So: - the load will load the same value into the result register across the warp, - the add will write the same value into the result register across the warp, - the store will write the same value to the same memory location, 32 times, at once, having the result of a single store. So, no side-effect multiplication (well, at least that's the observation). Starting sm_70, the threads in a warp are no longer guaranteed to reconverge after divergence. There's a "Convergence Optimizer" that can can identify that it is safe for a warp to reconverge, but that works only as long as the code does not contain "synchronizing operations". Consequently, the ld, add, store sequence can be executed by a non-uniform warp, which means the side effects can have multiplied, and the registers are no longer guarantueed to be in sync. The atomic update in the example above is translated using an atom.cas loop, which means that we have divergence (because only one thread is allowed to succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably because the atom.cas counts as a "synchronizing operation". So, it seems plausible that the root cause for the mentioned hang is the problem described above. Fix this by adding an explicit warp sync at simt exit. Note that we're assuming here that the warp will stay uniform until the next SIMT region entry. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-09 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp sync (or uniform warp check for mptx < 6.0). libgomp/ChangeLog: 2022-03-15 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * testsuite/libgomp.c/pr104783-2.c: New test.	2022-03-22 14:35:34 +01:00
Tobias Burnus	c133bdfa9e	Fortran/OpenMP: Fix privatization of associated names gfc_omp_predetermined_sharing cases the associate-name pointer variable to be OMP_CLAUSE_DEFAULT_FIRSTPRIVATE, which is fine. However, the associated selector is shared. Thus, the target of associate-name pointer should not get copied. (It was before but because of gfc_omp_privatize_by_reference returning false, the selector was not only wrongly copied but this was also not done properly.) gcc/fortran/ChangeLog: PR fortran/103039 * trans-openmp.cc (gfc_omp_clause_copy_ctor, gfc_omp_clause_dtor): Only privatize pointer for associate names. libgomp/ChangeLog: PR fortran/103039 * testsuite/libgomp.fortran/associate4.f90: New test.	2022-03-18 17:40:22 +01:00
Tom de Vries	093cdadbce	[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR Consider test-case pr104952-1.c, included in this commit, containing: ... #pragma omp target map(tofrom:result) map(to:arr) #pragma omp simd reduction(\|\|: result) ... When run on x86_64 with nvptx accelerator, the test-case either aborts or hangs. The reduction clause is translated by the SIMT code (active for nvptx) as a butterfly reduction loop with this butterfly shuffle / update pair: ... D.2163 = D.2163 \|\| .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) ... in the loop body. The problem is that the butterfly shuffle is possibly not executed, while it needs to be executed unconditionally. Fix this by translating instead as: ... D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) D.2163 = D.2163 \|\| D.tmp_bfly ... Tested on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY is executed unconditionally. libgomp/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * testsuite/libgomp.c/pr104952-1.c: New test. * testsuite/libgomp.c/pr104952-2.c: New test.	2022-03-18 15:45:13 +01:00
Thomas Schwinge	c43cb355f2	Enhance further testcases to verify Openacc 'kernels' decomposition gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-1.c: Enhance. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/nesting-1.c: Likewise. * gcc.dg/goacc/nested-function-1.c: Likewise. * gfortran.dg/goacc/common-block-3.f90: Likewise. * gfortran.dg/goacc/nested-function-1.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.	2022-03-17 08:51:32 +01:00
Thomas Schwinge	004fc4f2fc	Enhance further testcases to verify handling of OpenACC privatization level [PR90115] As originally introduced in commit `11b8286a83` "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR middle-end/90115 gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-1.c: Enhance. * gfortran.dg/goacc/common-block-3.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.	2022-03-17 08:47:09 +01:00
Marcel Vollweiler	be093b8dcc	OpenMP, Fortran: Bugfix for omp_set_num_teams. This patch fixes a small bug in the omp_set_num_teams implementation. libgomp/ChangeLog: * fortran.c (omp_set_num_teams_8_): Call omp_set_num_teams instead of omp_set_max_active_levels. * testsuite/libgomp.fortran/icv-8.f90: New test.	2022-03-16 07:38:54 -07:00
Thomas Schwinge	ab46fc7c3b	OpenACC privatization diagnostics vs. 'assert' [PR102841] It's an orthogonal concern why these diagnostics do appear at all for non-offloaded OpenACC constructs (where they're not relevant at all); PR90115. Depending on how 'assert' is implemented, it may cause temporaries to be created, and/or may lower into 'COND_EXPR's, and 'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'. Fix-up for commit `11b8286a83` "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR testsuite/102841 libgomp/ * testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust.	2022-03-16 10:12:09 +01:00
Thomas Schwinge	a07b8f4fb7	OpenACC 'kernels' decomposition: resolve wrong-code cases unless manually making certain variables addressable [PR100280, PR104892] Currently in OpenACC 'kernels' decomposition, there is special handling of 'GOMP_MAP_FORCE_TOFROM', documented to be done to avoid "internal compiler errors in later passes". For performance reasons, the current repetitive to/from device copying for every region is not ideal, compared to using 'present' clauses, as done for almost all other 'GOMP_MAP_'. Also, the current special handling (incomplete, evidently) is the reason for the PR104892 misbehavior. For PR100280 etc. we've resolved all such known ICEs -- removing the special handling for 'GOMP_MAP_FORCE_TOFROM' now resolves PR104892. PR middle-end/100280 PR middle-end/104892 gcc/ omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1): Remove special handling of 'GOMP_MAP_FORCE_TOFROM'. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-2.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.	2022-03-12 15:37:27 +01:00
Thomas Schwinge	535afbd959	OpenACC 'kernels' decomposition: wrong-code cases unless manually making certain variables addressable [PR104892] Document a few examples of the status quo. PR middle-end/104892 libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Point to PR104892. * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise, enable '--param=openacc-kernels=decompose' and adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.	2022-03-12 15:37:27 +01:00
Thomas Schwinge	2e53fa7bb2	Enhance further testcases to verify handling of OpenACC privatization level [PR90115] As originally introduced in commit `11b8286a83` "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR middle-end/90115 libgomp/ * testsuite/libgomp.oacc-c-c++-common/default-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.	2022-03-12 14:00:46 +01:00
Thomas Schwinge	337ed336d7	OpenACC 'kernels' decomposition: Mark variables used in 'present' clauses as addressable [PR100280, PR104086] ... like in recent commit `9b32c1669a` "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". Otherwise, we may run into 'gcc/omp-low.cc:lower_omp_target': 13125 else if (is_gimple_reg (var)) 13126 { 13127 gcc_assert (offloaded); PR middle-end/100280 PR middle-end/104086 gcc/ * omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1): Mark variables used in 'present' clauses as addressable. * omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Gracefully handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust, extend. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: Merge this... * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c: ..., and this... * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into this, and adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Extend.	2022-03-12 13:02:55 +01:00
Hafiz Abid Qadeer	7c2ac3cebd	Fix multiple issue in the testcase allocate-1.f90. 1. Thomas reported in https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589039.html that this testcase is randomly failing. The problem was fixed pool size which was exhausted when there were a lot of threads. Fixed it by removing pool_size trait which causes default pool size to be used which should be big enough. 2. Array indices have been changed to check the last element in the array. 3. Remove a redundant assignment and move some code to better match C testcase. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-1.f90: Remove pool_size trait. Test last index in w and v array. Remove redundant assignment to V(1). Move alignment checks at the end of parallel region.	2022-03-10 18:43:50 +00:00
Tom de Vries	f07178ca3c	[nvptx] Disable warp sync in simt region I ran into a hang for this code: ... #pragma omp target map(tofrom: counter_N0) #pragma omp simd for (int i = 0 ; i < 1 ; i++ ) { #pragma omp atomic update counter_N0 = counter_N0 + 1 ; } ... This has to do with the nature of -muniform-simt. It has two modes of operation: inside and outside an SIMT region. Outside an SIMT region, a warp pretends to execute a single thread, but actually executes in all threads, to keep the local registers in all threads consistent. This approach works unless the insn that is executed is a syscall or an atomic insn. In that case, the insn is predicated, such that it executes in only one thread. If the predicated insn writes a result to a register, then that register is propagated to the other threads, after which the local registers in all threads are consistent again. Inside an SIMT region, a warp executes in all threads. However, the predication and propagation for syscalls and atomic insns is also present here, because nvptx_reorg_uniform_simt works on all code. Care has been taken though to ensure that the predication and propagation is a nop. That is, inside an SIMT region: - the predicate evalutes to true for each thread, and - the propagation insn copies a register from each thread to the same thread. That works fine, until we use -mptx=6.0, and instead of using the deprecated warp propagation insn shfl, we start using shfl.sync: ... @%r33 atom.add.u32 _, [%r29], 1; shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff; ... The shfl.sync specifies a member mask indicating all threads, but given that the loop only has a single iteration, only thread 0 will execute the insn, where it will hang waiting for the other threads. Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the uniform warp check) such that it only executes outside the SIMT region. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-08 Tom de Vries <tdevries@suse.de> PR target/104783 * config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate) (nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate. (nvptx_get_unisimt_outside_simt_predicate): New function. (predicate_insn): New function, factored out of ... (nvptx_reorg_uniform_simt): ... here. Predicate all emitted insns. * config/nvptx/nvptx.h (struct machine_function): Add unisimt_outside_simt_predicate field. * config/nvptx/nvptx.md (define_insn "nvptx_warpsync") (define_insn "nvptx_uniform_warp_check"): Make predicable. libgomp/ChangeLog: 2022-03-10 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/pr104783.c: New test.	2022-03-10 12:20:44 +01:00
Thomas Schwinge	7a5e036b61	[OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, PR102330, PR104774] ... so that it matches what we analyze and what we action on. Fix-up for commit `29a2f51806` "openacc: Add support for gang local storage allocation in shared memory [PR90115]". PR middle-end/90115 PR middle-end/102330 PR middle-end/104774 gcc/ * omp-low.cc (oacc_privatization_candidate_p) (oacc_privatization_scan_clause_chain) (oacc_privatization_scan_decl_chain, lower_oacc_private_marker): Analyze 'lookup_decl'-translated DECL. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise. * c-c++-common/goacc/privatization-1-compute-loop.c: Likewise. * c-c++-common/goacc/privatization-1-compute.c: Likewise. * c-c++-common/goacc/privatization-1-routine_gang-loop.c: Likewise. * c-c++-common/goacc/privatization-1-routine_gang.c: Likewise. * gfortran.dg/goacc-gomp/pr102330-1.f90: Likewise, and subsume... * gfortran.dg/goacc-gomp/pr102330-2.f90: ... this file, and... * gfortran.dg/goacc-gomp/pr102330-3.f90: ... this file. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust. * gfortran.dg/goacc/privatization-1-compute.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.	2022-03-10 12:06:28 +01:00
Thomas Schwinge	1d9dc3dd74	Enhance further testcases to verify handling of OpenACC privatization level [PR90115] As originally introduced in commit `11b8286a83` "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR middle-end/90115 gcc/testsuite/ * c-c++-common/goacc/nesting-1.c: Enhance. * gcc.dg/goacc/nested-function-1.c: Likewise. * gcc.dg/goacc/nested-function-2.c: Likewise. * gfortran.dg/goacc/nested-function-1.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-fortran/routine-1.f90: Enhance. * testsuite/libgomp.oacc-fortran/routine-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/routine-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/routine-9.f90: Likewise.	2022-03-10 11:24:07 +01:00
Thomas Schwinge	14dfbb5359	Fix 'libgomp.oacc-c-c++-common/kernels-decompose-1.c' expected diagnostics Fix-up for recent commit `8935589b49` "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]": adjust for a GCN offloading workaround added just before commit: '(volatile void ) &f1;'. PR testsuite/104791 libgomp/ testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Fix expected diagnostics.	2022-03-04 20:42:29 +01:00
Thomas Schwinge	e28eb86c18	Test 'libgomp.oacc-/kernels-private-vars-' with '--param=openacc-kernels=decompose' [PR104784] Before recent commit `8935589b49` "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]", 'libgomp.oacc-c' testing already worked fine, but 'libgomp.oacc-c++' testing ICEed. Via the commit mentioned, the C++ testing ICEs are now resolved, but the underlying issue remains to be looked into: PR104784 "OpenACC 'kernels' decomposition: C vs. C++ differences". PR middle-end/104784 libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Test with '--param=openacc-kernels=decompose'. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90: Likewise.	2022-03-04 15:47:06 +01:00
Thomas Schwinge	07395f19df	Test '-fopt-info-omp-all' in 'libgomp.oacc-/kernels-private-vars-' libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Test '-fopt-info-omp-all'. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90: Likewise.	2022-03-04 14:47:19 +01:00
Thomas Schwinge	8935589b49	OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133] ... by generalizing the existing 'gcc/omp-low.cc:task_shared_vars'. Fix-up for commit `9b32c1669a` "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 PR middle-end/104132 PR middle-end/104133 gcc/ * omp-low.cc (task_shared_vars): Rename to 'make_addressable_vars'. Adjust all users. (scan_sharing_clauses) <OMP_CLAUSE_MAP> Use it for 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs, too. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Extend.	2022-03-04 14:21:01 +01:00
Thomas Schwinge	de6e81ea96	OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into OMP lowering [PR100280] ... in preparation for later changes. No functional change. Follow-up to commit `9b32c1669a` "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 gcc/ * tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New. * tree-core.h: Document it. * omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Handle 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'. * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of 'TREE_ADDRESSABLE'. gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise.	2022-03-04 14:21:01 +01:00
Thomas Schwinge	e5ae22c561	Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable" [PR100280] Follow-up to commit `9b32c1669a` "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 gcc/ * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable". gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Add '--param=openacc-privatization=noisy'. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise.	2022-03-04 14:21:00 +01:00
Tom de Vries	f485b0ed7d	[libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm.c When running with target board unix/-foffload=-mptx=3.1, we run into: ... lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \ selected -misa (sm_53)^M mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \ 1 exit status^M compilation terminated.^M ... FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors) ... Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm.c test-cases. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-02-28 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_. * testsuite/libgomp.c/declare-variant-3-sm35.c: Same. * testsuite/libgomp.c/declare-variant-3-sm53.c: Same. * testsuite/libgomp.c/declare-variant-3-sm70.c: Same. * testsuite/libgomp.c/declare-variant-3-sm75.c: Same. * testsuite/libgomp.c/declare-variant-3-sm80.c: Same.	2022-02-28 10:10:51 +01:00
Tom de Vries	59b8ade887	[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm.c Add openmp test-cases that test the omp declare variant construct: ... #pragma omp declare variant (f30) match (device={isa("sm_30")}) ... using the available nvptx isas. Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do link. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-02-24 Tom de Vries <tdevries@suse.de> testsuite/libgomp.c/declare-variant-3-sm30.c: New test. * testsuite/libgomp.c/declare-variant-3-sm35.c: New test. * testsuite/libgomp.c/declare-variant-3-sm53.c: New test. * testsuite/libgomp.c/declare-variant-3-sm70.c: New test. * testsuite/libgomp.c/declare-variant-3-sm75.c: New test. * testsuite/libgomp.c/declare-variant-3-sm80.c: New test. * testsuite/libgomp.c/declare-variant-3.h: New header file.	2022-02-24 11:41:03 +01:00
Thomas Schwinge	f8187b5c0d	Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90' This was a latent problem, and this commit here now resolves a regression that after recent commit `a78b1ab1df` "amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a GCN offloading '-march=gfx908' system: {+WARNING: program timed out.+} [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test Same for other optimization levels. Make sure that we're not executing non-parallelized code in gang-redundant mode, by putting these parts into their own 'parallel' constructs, which then default to 'num_gangs(1)'. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC gang-redundant execution.	2022-02-22 17:32:03 +01:00
Tom de Vries	5ed77fb3ed	[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end Consider the following omp fragment. ... #pragma omp target #pragma omp parallel num_threads (2) #pragma omp task ; ... This hangs at -O0 for nvptx. Investigating the behaviour gives us the following trace of events: - both threads execute GOMP_task, where they: - deposit a task, and - execute gomp_team_barrier_wake - thread 1 executes gomp_team_barrier_wait_end and, not being the last thread, proceeds to wait at the team barrier - thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it calls gomp_barrier_handle_tasks, where it: - executes both tasks and marks the team barrier done - executes a gomp_team_barrier_wake which wakes up thread 1 - thread 1 exits the team barrier - thread 0 returns from gomp_barrier_handle_tasks and goes to wait at the team barrier. - thread 0 hangs. To understand why there is a hang here, it's good to understand how things are setup for nvptx. The libgomp/config/nvptx/bar.c implementation is a copy of the libgomp/config/linux/bar.c implementation, with uses of both futex_wake and do_wait replaced with uses of ptx insn bar.sync: ... if (bar->total > 1) asm ("bar.sync 1, %0;" : : "r" (32 * bar->total)); ... The point where thread 0 goes to wait at the team barrier, corresponds in the linux implementation with a do_wait. In the linux case, the call to do_wait doesn't hang, because it's waiting for bar->generation to become a certain value, and if bar->generation already has that value, it just proceeds, without any need for coordination with other threads. In the nvtpx case, the bar.sync waits until thread 1 joins it in the same logical barrier, which never happens: thread 1 is lingering in the thread pool at the thread pool barrier (using a different logical barrier), waiting to join a new team. The easiest way to fix this is to revert to the posix implementation for bar.{c,h}. That however falls back on a busy-waiting approach, and does not take advantage of the ptx bar.sync insn. Instead, we revert to the linux implementation for bar.c, and implement bar.c local functions futex_wait and futex_wake using the bar.sync insn. The bar.sync insn takes an argument specifying how many threads are participating, and that doesn't play well with the futex syntax where it's not clear in advance how many threads will be woken up. This is solved by waking up all waiting threads each time a futex_wait or futex_wake happens, and possibly going back to sleep with an updated thread count. Tested libgomp on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2021-04-20 Tom de Vries <tdevries@suse.de> PR target/99555 * config/nvptx/bar.c (generation_to_barrier): New function, copied from config/rtems/bar.c. (futex_wait, futex_wake): New function. (do_spin, do_wait): New function, copied from config/linux/wait.h. (gomp_barrier_wait_end, gomp_barrier_wait_last) (gomp_team_barrier_wake, gomp_team_barrier_wait_end): (gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove and replace with include of config/linux/bar.c. * config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock. (gomp_barrier_init): Init new fields. * testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific workarounds. * testsuite/libgomp.c/pr99555-1.c: Same. * testsuite/libgomp.fortran/task-detach-6.f90: Same.	2022-02-22 15:48:03 +01:00
Tom de Vries	6263b656c8	[libgomp, testsuite, nvptx] Fix pr96390.c without CUDA When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into: ... XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test ... The problem is that we're expecting the following ptxas error: ... XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) Excess errors: ptxas /tmp/ccZYDw8N.o, line 90; error : Call to 'baz' requires call prototype ptxas /tmp/ccZYDw8N.o, line 90; error : Unknown symbol 'baz' ... But it's not triggered because ptxas is not in the path, so nvptx-none-as defaults to --no-verify. So instead, we run into the same error at execution time. Fix this by forcing verification using: ... /* { dg-additional-options "-foffload=-Wa,--verify" \ { target offload_target_nvptx } } / ... such that we run into the xfail in this way instead: ... XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) Excess errors: nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory nvptx-as: ptxas returned 255 exit status ... Tested on x86_64-linux with nvptx accelerator. libgomp/ChangeLog: 2022-02-21 Tom de Vries <tdevries@suse.de> PR testsuite/104146 testsuite/libgomp.c++/pr96390.C: Add additional-option -foffload=-Wa,--verify for nvptx. * testsuite/libgomp.c-c++-common/pr96390.c: Same.	2022-02-22 10:23:20 +01:00
Tobias Burnus	3939c1b112	Fortran/OpenMP: Fix depend-clause handling gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj): Depend on the proper addr, for ptr/alloc depend on pointee. libgomp/ChangeLog: * testsuite/libgomp.fortran/depend-4.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/depend-4.f90: New test. * gfortran.dg/gomp/depend-5.f90: New test.	2022-02-15 12:26:48 +01:00

1 2 3 4 5 ...

1013 Commits