OpenE2K/gcc - gcc - Expired Mentality Git

Author	SHA1	Message	Date
Thomas Schwinge	7981524755	Fix up 'libgomp.fortran/use_device_addr-5.f90' multi-device testing Fix-up for recent commit r13-116-g3f8c389fe90bf565a6221a46bb7fb745dd4c1510 "OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg", where we currently get: libgomp: use_device_ptr pointer wasn't mapped FAIL: libgomp.fortran/use_device_addr-5.f90 -O execution test libgomp/ * testsuite/libgomp.fortran/use_device_addr-5.f90: Fix up multi-device testing.	2022-05-10 14:48:11 +02:00
Marcel Vollweiler	4043f53cb4	OpenMP, libgomp: Add new runtime routine omp_target_is_accessible. gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added target_is_accessible to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_target_is_accessible. * libgomp.texi: Tagged omp_target_is_accessible as supported. * omp.h.in: Added omp_target_is_accessible. * omp_lib.f90.in: Added interface for omp_target_is_accessible. * omp_lib.h.in: Likewise. * target.c (omp_target_is_accessible): Added implementation of omp_target_is_accessible. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c: New test. * testsuite/libgomp.fortran/target-is-accessible-1.f90: New test.	2022-05-06 07:28:26 -07:00
Tobias Burnus	3f8c389fe9	OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg For array-descriptor vars, the descriptor is assigned to a temporary. However, this failed when the clause's argument was in turn in a data-sharing clause as the outer context's VALUE_EXPR wasn't used. gcc/ChangeLog: * omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list item that is in an outer data-sharing clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_addr-5.f90: New test.	2022-05-04 18:18:44 +02:00
Marcel Vollweiler	941cdc8b6d	OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr. This patch adds the OpenMP runtime routine "omp_get_mapped_ptr" which was introduced in OpenMP 5.1. gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added get_mapped_ptr to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_get_mapped_ptr. * libgomp.texi: Tagged omp_get_mapped_ptr as supported. * omp.h.in: Added omp_get_mapped_ptr. * omp_lib.f90.in: Added interface for omp_get_mapped_ptr. * omp_lib.h.in: Likewise. * target.c (omp_get_mapped_ptr): Added implementation of omp_get_mapped_ptr. * testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test. * testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.	2022-05-02 23:56:44 -07:00
Thomas Schwinge	2a570f11a2	Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] That got broken by recent commit `b220243191` "fortran: Fix up gfc_trans_oacc_construct [PR104717]". PR fortran/104717 libgomp/ * testsuite/libgomp.oacc-fortran/print-1.f90: Add OpenACC privatization scanning. For GCN offloading compilation, raise '-mgang-private-size'.	2022-04-28 15:15:29 +02:00
Jakub Jelinek	b220243191	fortran: Fix up gfc_trans_oacc_construct [PR104717] So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs for which that function is invoked need an extra artificial BIND_EXPR around their body so that we move all variables of the bodies. The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK or OMP_TARGET and for OpenACC constructs that behave similarly to OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. The following patch does that for OpenACC constructs too. PR fortran/104717 gcc/fortran/ * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body in an extra BIND_EXPR. gcc/testsuite/ * gfortran.dg/goacc/pr104717.f90: New test. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>	2022-04-25 23:14:02 +02:00
Chung-Lin Tang	b0af8e3a50	OpenMP: Fix nested use_device_ptr This patch fixes a bug in lower_omp_target, where for Fortran arrays, the expanded sender assignment is wrongly using the variable in the current ctx, instead of the one looked-up outside, which is causing use_device_ptr/addr to fail to work when used inside an omp-parallel (where the omp child_fn is split away from the original). The fix is inside omp-low.cc, though because the omp_array_data langhook is used only by Fortran, this is essentially Fortran-specific. 2022-04-05 Chung-Lin Tang <cltang@codesourcery.com> gcc/ChangeLog: * omp-low.cc (lower_omp_target): Use outer context looked-up 'var' as argument to lang_hooks.decls.omp_array_data, instead of 'ovar' from current clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_ptr-4.f90: New testcase.	2022-04-05 08:31:34 -07:00
Tom de Vries	88cffa1a07	[libgomp/testsuite] Fix libgomp.fortran/examples-4/declare_target-{1,2}.f90 The test-cases libgomp.fortran/examples-4/declare_target-{1,2}.f90 mean to set an nvptx-specific limit using offload_target_nvptx, but also change behaviour for amd. That is, there is now a difference in behaviour between: - a compiler configured for GCN offloading, and - a compiler configured for both GCN and nvptx offloading. Fix this by using instead on_device_arch_nvptx. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-04 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Use on_device_arch_nvptx instead of offload_target_nvptx. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.	2022-04-04 13:37:19 +02:00
Tom de Vries	bfa9f660d2	[libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90 When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run into: ... FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \ -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \ -DGOMP_NVPTX_JIT=-O0 execution test ... Fix this by further limiting recursion depth in the test-cases for nvptx. Furthermore, make the recursion depth limiting nvptx-specific. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-01 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Define and use REC_DEPTH. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.	2022-04-01 13:23:16 +02:00
Tom de Vries	065e25f633	[libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.c When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an RTX A2000 (sm_86) with driver 510.60.02 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ output pattern test ... The failing check verifies the launch dimensions: ... /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: \ launch gangs=1, workers=8, vectors=128" } / ... which fails because (as we can see with GOMP_DEBUG=1) the actual num_workers is 6: ... nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128 ... This is due to the result of cuOccupancyMaxPotentialBlockSize (which suggests 'a launch configuration with reasonable occupancy') printed just before: ... cuOccupancyMaxPotentialBlockSize: grid = 52, block = 768 ... [ Note: 6 128 == 768. ] Fix this by updating the check to allow num_workers in the range 1 to 8. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-01 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Fix num_workers check.	2022-04-01 13:22:07 +02:00
Tom de Vries	8570cce7c7	[libgomp, testsuite] Scale down some OpenACC test-cases When a display manager is running on an nvidia card, all CUDA kernel launches get a 5 seconds watchdog timer. Consequently, when running the libgomp testsuite with nvptx accelerator and GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this: ... libgomp: cuStreamSynchronize error: the launch timed out and was terminated FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ execution test ... Fix this by scaling down the failing test-cases by default, and reverting to the original behaviour for GCC_TEST_RUN_EXPENSIVE=1. Tested on x86_64-linux with nvptx accelerator. libgomp/ChangeLog: 2022-03-25 Tom de Vries <tdevries@suse.de> PR libgomp/105042 * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce execution time. * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.	2022-03-25 13:51:48 +01:00
Tobias Burnus	1002a7ace1	LTO: Fixes for renaming issues with offload/OpenMP [PR104285] gcc/lto/ChangeLog: PR middle-end/104285 * lto-partition.cc (maybe_rewrite_identifier): Use get_identifier for the returned string to be usable as hash key. (validize_symbol_for_target): Hence, use return value directly. (privatize_symbol_name_1): Track maybe_rewrite_identifier renames. * lto.cc (offload_handle_link_vars): Move function up before ... (do_whole_program_analysis): Call it after static renamings. (lto_main): Move call after static renamings. libgomp/ChangeLog: PR middle-end/104285 * testsuite/libgomp.c++/target-same-name-2-a.C: New test. * testsuite/libgomp.c++/target-same-name-2-b.C: New test. * testsuite/libgomp.c++/target-same-name-2.C: New test. * testsuite/libgomp.c-c++-common/target-same-name-1-a.c: New test. * testsuite/libgomp.c-c++-common/target-same-name-1-b.c: New test. * testsuite/libgomp.c-c++-common/target-same-name-1.c: New test.	2022-03-23 09:44:39 +01:00
Tom de Vries	a624388b95	[nvptx] Add warp sync at simt exit Consider this code (with N defined to 1024): ... float v = 0.0; #pragma omp target map(tofrom: v) #pragma omp parallel for simd for (int i = 0 ; i < N; i++) { #pragma omp atomic update v = v + 1.0; } ... It hangs when executing on target board unix/-foffload=-misa=sm_75, using drivers 470.103.01 and 510.54 on a T400 board (sm_75). I'm tentatively identifying the problem as a bug in -muniform-simt for architectures that support Independent Thread Scheduling (sm_70 and later). The problem -muniform-simt is trying to address is to make sure that a register produced outside an openmp simd region is available when used in any lane inside an simd region. The solution is to, outside an simd region, execute in all warp lanes, thus producing consistent values in result registers in each warp thread. This approach doesn't work when executing in all warp lanes multiplies the side effects from 1 to 32 separate side effects, which is the case for atomic insns. So atomic insns are rewritten to execute only in lane 0, and if there are any results, those are propagated to the other threads in the warp. [ And likewise for system calls malloc, free, vprintf. ] Now, consider a non-atomic update: ld, add, store. The store has side effects, are those multiplied or not? Pre-sm_70 we can assume that at the end of an SIMT region, any divergent control flow has reconverged, and we have a uniform warp, executing in lock step. So: - the load will load the same value into the result register across the warp, - the add will write the same value into the result register across the warp, - the store will write the same value to the same memory location, 32 times, at once, having the result of a single store. So, no side-effect multiplication (well, at least that's the observation). Starting sm_70, the threads in a warp are no longer guaranteed to reconverge after divergence. There's a "Convergence Optimizer" that can can identify that it is safe for a warp to reconverge, but that works only as long as the code does not contain "synchronizing operations". Consequently, the ld, add, store sequence can be executed by a non-uniform warp, which means the side effects can have multiplied, and the registers are no longer guarantueed to be in sync. The atomic update in the example above is translated using an atom.cas loop, which means that we have divergence (because only one thread is allowed to succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably because the atom.cas counts as a "synchronizing operation". So, it seems plausible that the root cause for the mentioned hang is the problem described above. Fix this by adding an explicit warp sync at simt exit. Note that we're assuming here that the warp will stay uniform until the next SIMT region entry. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-09 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp sync (or uniform warp check for mptx < 6.0). libgomp/ChangeLog: 2022-03-15 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * testsuite/libgomp.c/pr104783-2.c: New test.	2022-03-22 14:35:34 +01:00
Tobias Burnus	c133bdfa9e	Fortran/OpenMP: Fix privatization of associated names gfc_omp_predetermined_sharing cases the associate-name pointer variable to be OMP_CLAUSE_DEFAULT_FIRSTPRIVATE, which is fine. However, the associated selector is shared. Thus, the target of associate-name pointer should not get copied. (It was before but because of gfc_omp_privatize_by_reference returning false, the selector was not only wrongly copied but this was also not done properly.) gcc/fortran/ChangeLog: PR fortran/103039 * trans-openmp.cc (gfc_omp_clause_copy_ctor, gfc_omp_clause_dtor): Only privatize pointer for associate names. libgomp/ChangeLog: PR fortran/103039 * testsuite/libgomp.fortran/associate4.f90: New test.	2022-03-18 17:40:22 +01:00
Tom de Vries	093cdadbce	[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR Consider test-case pr104952-1.c, included in this commit, containing: ... #pragma omp target map(tofrom:result) map(to:arr) #pragma omp simd reduction(\|\|: result) ... When run on x86_64 with nvptx accelerator, the test-case either aborts or hangs. The reduction clause is translated by the SIMT code (active for nvptx) as a butterfly reduction loop with this butterfly shuffle / update pair: ... D.2163 = D.2163 \|\| .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) ... in the loop body. The problem is that the butterfly shuffle is possibly not executed, while it needs to be executed unconditionally. Fix this by translating instead as: ... D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) D.2163 = D.2163 \|\| D.tmp_bfly ... Tested on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY is executed unconditionally. libgomp/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * testsuite/libgomp.c/pr104952-1.c: New test. * testsuite/libgomp.c/pr104952-2.c: New test.	2022-03-18 15:45:13 +01:00
Thomas Schwinge	c43cb355f2	Enhance further testcases to verify Openacc 'kernels' decomposition gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-1.c: Enhance. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/nesting-1.c: Likewise. * gcc.dg/goacc/nested-function-1.c: Likewise. * gfortran.dg/goacc/common-block-3.f90: Likewise. * gfortran.dg/goacc/nested-function-1.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.	2022-03-17 08:51:32 +01:00
Thomas Schwinge	004fc4f2fc	Enhance further testcases to verify handling of OpenACC privatization level [PR90115] As originally introduced in commit `11b8286a83` "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR middle-end/90115 gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-1.c: Enhance. * gfortran.dg/goacc/common-block-3.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.	2022-03-17 08:47:09 +01:00
Marcel Vollweiler	be093b8dcc	OpenMP, Fortran: Bugfix for omp_set_num_teams. This patch fixes a small bug in the omp_set_num_teams implementation. libgomp/ChangeLog: * fortran.c (omp_set_num_teams_8_): Call omp_set_num_teams instead of omp_set_max_active_levels. * testsuite/libgomp.fortran/icv-8.f90: New test.	2022-03-16 07:38:54 -07:00
Thomas Schwinge	ab46fc7c3b	OpenACC privatization diagnostics vs. 'assert' [PR102841] It's an orthogonal concern why these diagnostics do appear at all for non-offloaded OpenACC constructs (where they're not relevant at all); PR90115. Depending on how 'assert' is implemented, it may cause temporaries to be created, and/or may lower into 'COND_EXPR's, and 'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'. Fix-up for commit `11b8286a83` "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR testsuite/102841 libgomp/ * testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust.	2022-03-16 10:12:09 +01:00
Thomas Schwinge	a07b8f4fb7	OpenACC 'kernels' decomposition: resolve wrong-code cases unless manually making certain variables addressable [PR100280, PR104892] Currently in OpenACC 'kernels' decomposition, there is special handling of 'GOMP_MAP_FORCE_TOFROM', documented to be done to avoid "internal compiler errors in later passes". For performance reasons, the current repetitive to/from device copying for every region is not ideal, compared to using 'present' clauses, as done for almost all other 'GOMP_MAP_'. Also, the current special handling (incomplete, evidently) is the reason for the PR104892 misbehavior. For PR100280 etc. we've resolved all such known ICEs -- removing the special handling for 'GOMP_MAP_FORCE_TOFROM' now resolves PR104892. PR middle-end/100280 PR middle-end/104892 gcc/ omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1): Remove special handling of 'GOMP_MAP_FORCE_TOFROM'. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-2.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.	2022-03-12 15:37:27 +01:00
Thomas Schwinge	535afbd959	OpenACC 'kernels' decomposition: wrong-code cases unless manually making certain variables addressable [PR104892] Document a few examples of the status quo. PR middle-end/104892 libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Point to PR104892. * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise, enable '--param=openacc-kernels=decompose' and adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.	2022-03-12 15:37:27 +01:00
Thomas Schwinge	2e53fa7bb2	Enhance further testcases to verify handling of OpenACC privatization level [PR90115] As originally introduced in commit `11b8286a83` "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR middle-end/90115 libgomp/ * testsuite/libgomp.oacc-c-c++-common/default-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.	2022-03-12 14:00:46 +01:00
Thomas Schwinge	337ed336d7	OpenACC 'kernels' decomposition: Mark variables used in 'present' clauses as addressable [PR100280, PR104086] ... like in recent commit `9b32c1669a` "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". Otherwise, we may run into 'gcc/omp-low.cc:lower_omp_target': 13125 else if (is_gimple_reg (var)) 13126 { 13127 gcc_assert (offloaded); PR middle-end/100280 PR middle-end/104086 gcc/ * omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1): Mark variables used in 'present' clauses as addressable. * omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Gracefully handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust, extend. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: Merge this... * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c: ..., and this... * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into this, and adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Extend.	2022-03-12 13:02:55 +01:00
Hafiz Abid Qadeer	7c2ac3cebd	Fix multiple issue in the testcase allocate-1.f90. 1. Thomas reported in https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589039.html that this testcase is randomly failing. The problem was fixed pool size which was exhausted when there were a lot of threads. Fixed it by removing pool_size trait which causes default pool size to be used which should be big enough. 2. Array indices have been changed to check the last element in the array. 3. Remove a redundant assignment and move some code to better match C testcase. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-1.f90: Remove pool_size trait. Test last index in w and v array. Remove redundant assignment to V(1). Move alignment checks at the end of parallel region.	2022-03-10 18:43:50 +00:00
Tom de Vries	f07178ca3c	[nvptx] Disable warp sync in simt region I ran into a hang for this code: ... #pragma omp target map(tofrom: counter_N0) #pragma omp simd for (int i = 0 ; i < 1 ; i++ ) { #pragma omp atomic update counter_N0 = counter_N0 + 1 ; } ... This has to do with the nature of -muniform-simt. It has two modes of operation: inside and outside an SIMT region. Outside an SIMT region, a warp pretends to execute a single thread, but actually executes in all threads, to keep the local registers in all threads consistent. This approach works unless the insn that is executed is a syscall or an atomic insn. In that case, the insn is predicated, such that it executes in only one thread. If the predicated insn writes a result to a register, then that register is propagated to the other threads, after which the local registers in all threads are consistent again. Inside an SIMT region, a warp executes in all threads. However, the predication and propagation for syscalls and atomic insns is also present here, because nvptx_reorg_uniform_simt works on all code. Care has been taken though to ensure that the predication and propagation is a nop. That is, inside an SIMT region: - the predicate evalutes to true for each thread, and - the propagation insn copies a register from each thread to the same thread. That works fine, until we use -mptx=6.0, and instead of using the deprecated warp propagation insn shfl, we start using shfl.sync: ... @%r33 atom.add.u32 _, [%r29], 1; shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff; ... The shfl.sync specifies a member mask indicating all threads, but given that the loop only has a single iteration, only thread 0 will execute the insn, where it will hang waiting for the other threads. Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the uniform warp check) such that it only executes outside the SIMT region. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-08 Tom de Vries <tdevries@suse.de> PR target/104783 * config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate) (nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate. (nvptx_get_unisimt_outside_simt_predicate): New function. (predicate_insn): New function, factored out of ... (nvptx_reorg_uniform_simt): ... here. Predicate all emitted insns. * config/nvptx/nvptx.h (struct machine_function): Add unisimt_outside_simt_predicate field. * config/nvptx/nvptx.md (define_insn "nvptx_warpsync") (define_insn "nvptx_uniform_warp_check"): Make predicable. libgomp/ChangeLog: 2022-03-10 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/pr104783.c: New test.	2022-03-10 12:20:44 +01:00
Thomas Schwinge	7a5e036b61	[OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, PR102330, PR104774] ... so that it matches what we analyze and what we action on. Fix-up for commit `29a2f51806` "openacc: Add support for gang local storage allocation in shared memory [PR90115]". PR middle-end/90115 PR middle-end/102330 PR middle-end/104774 gcc/ * omp-low.cc (oacc_privatization_candidate_p) (oacc_privatization_scan_clause_chain) (oacc_privatization_scan_decl_chain, lower_oacc_private_marker): Analyze 'lookup_decl'-translated DECL. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise. * c-c++-common/goacc/privatization-1-compute-loop.c: Likewise. * c-c++-common/goacc/privatization-1-compute.c: Likewise. * c-c++-common/goacc/privatization-1-routine_gang-loop.c: Likewise. * c-c++-common/goacc/privatization-1-routine_gang.c: Likewise. * gfortran.dg/goacc-gomp/pr102330-1.f90: Likewise, and subsume... * gfortran.dg/goacc-gomp/pr102330-2.f90: ... this file, and... * gfortran.dg/goacc-gomp/pr102330-3.f90: ... this file. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust. * gfortran.dg/goacc/privatization-1-compute.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.	2022-03-10 12:06:28 +01:00
Thomas Schwinge	1d9dc3dd74	Enhance further testcases to verify handling of OpenACC privatization level [PR90115] As originally introduced in commit `11b8286a83` "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR middle-end/90115 gcc/testsuite/ * c-c++-common/goacc/nesting-1.c: Enhance. * gcc.dg/goacc/nested-function-1.c: Likewise. * gcc.dg/goacc/nested-function-2.c: Likewise. * gfortran.dg/goacc/nested-function-1.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-fortran/routine-1.f90: Enhance. * testsuite/libgomp.oacc-fortran/routine-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/routine-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/routine-9.f90: Likewise.	2022-03-10 11:24:07 +01:00
Thomas Schwinge	14dfbb5359	Fix 'libgomp.oacc-c-c++-common/kernels-decompose-1.c' expected diagnostics Fix-up for recent commit `8935589b49` "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]": adjust for a GCN offloading workaround added just before commit: '(volatile void ) &f1;'. PR testsuite/104791 libgomp/ testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Fix expected diagnostics.	2022-03-04 20:42:29 +01:00
Thomas Schwinge	e28eb86c18	Test 'libgomp.oacc-/kernels-private-vars-' with '--param=openacc-kernels=decompose' [PR104784] Before recent commit `8935589b49` "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]", 'libgomp.oacc-c' testing already worked fine, but 'libgomp.oacc-c++' testing ICEed. Via the commit mentioned, the C++ testing ICEs are now resolved, but the underlying issue remains to be looked into: PR104784 "OpenACC 'kernels' decomposition: C vs. C++ differences". PR middle-end/104784 libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Test with '--param=openacc-kernels=decompose'. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90: Likewise.	2022-03-04 15:47:06 +01:00
Thomas Schwinge	07395f19df	Test '-fopt-info-omp-all' in 'libgomp.oacc-/kernels-private-vars-' libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Test '-fopt-info-omp-all'. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90: Likewise.	2022-03-04 14:47:19 +01:00
Thomas Schwinge	8935589b49	OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133] ... by generalizing the existing 'gcc/omp-low.cc:task_shared_vars'. Fix-up for commit `9b32c1669a` "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 PR middle-end/104132 PR middle-end/104133 gcc/ * omp-low.cc (task_shared_vars): Rename to 'make_addressable_vars'. Adjust all users. (scan_sharing_clauses) <OMP_CLAUSE_MAP> Use it for 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs, too. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Extend.	2022-03-04 14:21:01 +01:00
Thomas Schwinge	de6e81ea96	OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into OMP lowering [PR100280] ... in preparation for later changes. No functional change. Follow-up to commit `9b32c1669a` "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 gcc/ * tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New. * tree-core.h: Document it. * omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Handle 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'. * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of 'TREE_ADDRESSABLE'. gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise.	2022-03-04 14:21:01 +01:00
Thomas Schwinge	e5ae22c561	Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable" [PR100280] Follow-up to commit `9b32c1669a` "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 gcc/ * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable". gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Add '--param=openacc-privatization=noisy'. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise.	2022-03-04 14:21:00 +01:00
Tom de Vries	f485b0ed7d	[libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm.c When running with target board unix/-foffload=-mptx=3.1, we run into: ... lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \ selected -misa (sm_53)^M mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \ 1 exit status^M compilation terminated.^M ... FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors) ... Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm.c test-cases. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-02-28 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_. * testsuite/libgomp.c/declare-variant-3-sm35.c: Same. * testsuite/libgomp.c/declare-variant-3-sm53.c: Same. * testsuite/libgomp.c/declare-variant-3-sm70.c: Same. * testsuite/libgomp.c/declare-variant-3-sm75.c: Same. * testsuite/libgomp.c/declare-variant-3-sm80.c: Same.	2022-02-28 10:10:51 +01:00
Tom de Vries	59b8ade887	[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm.c Add openmp test-cases that test the omp declare variant construct: ... #pragma omp declare variant (f30) match (device={isa("sm_30")}) ... using the available nvptx isas. Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do link. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-02-24 Tom de Vries <tdevries@suse.de> testsuite/libgomp.c/declare-variant-3-sm30.c: New test. * testsuite/libgomp.c/declare-variant-3-sm35.c: New test. * testsuite/libgomp.c/declare-variant-3-sm53.c: New test. * testsuite/libgomp.c/declare-variant-3-sm70.c: New test. * testsuite/libgomp.c/declare-variant-3-sm75.c: New test. * testsuite/libgomp.c/declare-variant-3-sm80.c: New test. * testsuite/libgomp.c/declare-variant-3.h: New header file.	2022-02-24 11:41:03 +01:00
Thomas Schwinge	f8187b5c0d	Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90' This was a latent problem, and this commit here now resolves a regression that after recent commit `a78b1ab1df` "amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a GCN offloading '-march=gfx908' system: {+WARNING: program timed out.+} [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test Same for other optimization levels. Make sure that we're not executing non-parallelized code in gang-redundant mode, by putting these parts into their own 'parallel' constructs, which then default to 'num_gangs(1)'. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC gang-redundant execution.	2022-02-22 17:32:03 +01:00
Tom de Vries	5ed77fb3ed	[libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end Consider the following omp fragment. ... #pragma omp target #pragma omp parallel num_threads (2) #pragma omp task ; ... This hangs at -O0 for nvptx. Investigating the behaviour gives us the following trace of events: - both threads execute GOMP_task, where they: - deposit a task, and - execute gomp_team_barrier_wake - thread 1 executes gomp_team_barrier_wait_end and, not being the last thread, proceeds to wait at the team barrier - thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it calls gomp_barrier_handle_tasks, where it: - executes both tasks and marks the team barrier done - executes a gomp_team_barrier_wake which wakes up thread 1 - thread 1 exits the team barrier - thread 0 returns from gomp_barrier_handle_tasks and goes to wait at the team barrier. - thread 0 hangs. To understand why there is a hang here, it's good to understand how things are setup for nvptx. The libgomp/config/nvptx/bar.c implementation is a copy of the libgomp/config/linux/bar.c implementation, with uses of both futex_wake and do_wait replaced with uses of ptx insn bar.sync: ... if (bar->total > 1) asm ("bar.sync 1, %0;" : : "r" (32 * bar->total)); ... The point where thread 0 goes to wait at the team barrier, corresponds in the linux implementation with a do_wait. In the linux case, the call to do_wait doesn't hang, because it's waiting for bar->generation to become a certain value, and if bar->generation already has that value, it just proceeds, without any need for coordination with other threads. In the nvtpx case, the bar.sync waits until thread 1 joins it in the same logical barrier, which never happens: thread 1 is lingering in the thread pool at the thread pool barrier (using a different logical barrier), waiting to join a new team. The easiest way to fix this is to revert to the posix implementation for bar.{c,h}. That however falls back on a busy-waiting approach, and does not take advantage of the ptx bar.sync insn. Instead, we revert to the linux implementation for bar.c, and implement bar.c local functions futex_wait and futex_wake using the bar.sync insn. The bar.sync insn takes an argument specifying how many threads are participating, and that doesn't play well with the futex syntax where it's not clear in advance how many threads will be woken up. This is solved by waking up all waiting threads each time a futex_wait or futex_wake happens, and possibly going back to sleep with an updated thread count. Tested libgomp on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2021-04-20 Tom de Vries <tdevries@suse.de> PR target/99555 * config/nvptx/bar.c (generation_to_barrier): New function, copied from config/rtems/bar.c. (futex_wait, futex_wake): New function. (do_spin, do_wait): New function, copied from config/linux/wait.h. (gomp_barrier_wait_end, gomp_barrier_wait_last) (gomp_team_barrier_wake, gomp_team_barrier_wait_end): (gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove and replace with include of config/linux/bar.c. * config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock. (gomp_barrier_init): Init new fields. * testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific workarounds. * testsuite/libgomp.c/pr99555-1.c: Same. * testsuite/libgomp.fortran/task-detach-6.f90: Same.	2022-02-22 15:48:03 +01:00
Tom de Vries	6263b656c8	[libgomp, testsuite, nvptx] Fix pr96390.c without CUDA When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into: ... XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test ... The problem is that we're expecting the following ptxas error: ... XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) Excess errors: ptxas /tmp/ccZYDw8N.o, line 90; error : Call to 'baz' requires call prototype ptxas /tmp/ccZYDw8N.o, line 90; error : Unknown symbol 'baz' ... But it's not triggered because ptxas is not in the path, so nvptx-none-as defaults to --no-verify. So instead, we run into the same error at execution time. Fix this by forcing verification using: ... /* { dg-additional-options "-foffload=-Wa,--verify" \ { target offload_target_nvptx } } / ... such that we run into the xfail in this way instead: ... XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) Excess errors: nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory nvptx-as: ptxas returned 255 exit status ... Tested on x86_64-linux with nvptx accelerator. libgomp/ChangeLog: 2022-02-21 Tom de Vries <tdevries@suse.de> PR testsuite/104146 testsuite/libgomp.c++/pr96390.C: Add additional-option -foffload=-Wa,--verify for nvptx. * testsuite/libgomp.c-c++-common/pr96390.c: Same.	2022-02-22 10:23:20 +01:00
Tobias Burnus	3939c1b112	Fortran/OpenMP: Fix depend-clause handling gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj): Depend on the proper addr, for ptr/alloc depend on pointee. libgomp/ChangeLog: * testsuite/libgomp.fortran/depend-4.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/depend-4.f90: New test. * gfortran.dg/gomp/depend-5.f90: New test.	2022-02-15 12:26:48 +01:00
Tobias Burnus	c22f3fb780	OpenMP/C++: Permit mapping classes with virtual members [PR102204] PR c++/102204 gcc/cp/ChangeLog: * decl2.cc (cp_omp_mappable_type_1): Remove check for virtual members as those are permitted since OpenMP 5.0. libgomp/ChangeLog: * testsuite/libgomp.c++/target-virtual-1.C: New test. gcc/testsuite/ChangeLog: * g++.dg/gomp/unmappable-1.C: Remove previously expected dg-message.	2022-02-10 19:03:42 +01:00
Marcel Vollweiler	bbb7f8604e	C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct. This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff): has_device_addr(list) "The has_device_addr clause indicates that its list items already have device addresses and therefore they may be directly accessed from a target device. If the device address of a list item is not for the device on which the target region executes, accessing the list item inside the region results in unspecified behavior. The list items may include array sections." (p. 200) "A list item may not be specified in both an is_device_ptr clause and a has_device_addr clause on the directive." (p. 202) "A list item that appears in an is_device_ptr or a has_device_addr clause must not be specified in any data-sharing attribute clause on the same target construct." (p. 203) gcc/c-family/ChangeLog: * c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * c-pragma.h (enum pragma_kind): Added 5.1 in comment. (enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr' clause. (c_parser_omp_variable_list): Handle array sections. (c_parser_omp_clause_has_device_addr): Added. (c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * c-typeck.cc (handle_omp_array_sections): Handle clause restrictions. (c_finish_omp_clauses): Handle array sections. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause. (cp_parser_omp_var_list_no_open): Handle array sections. (cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * semantics.cc (handle_omp_array_sections): Handle clause restrictions. (finish_omp_clauses): Handle array sections. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR case. * gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR. * openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR. (gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause. (resolve_omp_clauses): Same. * trans-openmp.cc (gfc_trans_omp_variable_list): Added OMP_LIST_HAS_DEVICE_ADDR case. (gfc_trans_omp_clauses): Firstprivatize of array descriptors. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Added cases for OMP_CLAUSE_HAS_DEVICE_ADDR and handle array sections. (gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR. (lower_omp_target): Same. * tree-core.h (enum omp_clause_code): Same. * tree-nested.cc (convert_nonlocal_omp_clauses): Same. (convert_local_omp_clauses): Same. * tree-pretty-print.cc (dump_omp_clause): Same. * tree.cc: Same. libgomp/ChangeLog: * libgomp.texi: Updated entry for HAS_DEVICE_ADDR. * target.c (copy_firstprivate_data): Copy only if host address is not NULL. * testsuite/libgomp.c++/target-has-device-addr-2.C: New test. * testsuite/libgomp.c++/target-has-device-addr-4.C: New test. * testsuite/libgomp.c++/target-has-device-addr-5.C: New test. * testsuite/libgomp.c++/target-has-device-addr-6.C: New test. * testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test. * testsuite/libgomp.c/target-has-device-addr-3.c: New test. * testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases. * g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases. * g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases. * c-c++-common/gomp/target-has-device-addr-1.c: New test. * c-c++-common/gomp/target-has-device-addr-2.c: New test. * c-c++-common/gomp/target-is-device-ptr-1.c: New test. * c-c++-common/gomp/target-is-device-ptr-2.c: New test. * gfortran.dg/gomp/is_device_ptr-3.f90: New test. * gfortran.dg/gomp/target-has-device-addr-1.f90: New test. * gfortran.dg/gomp/target-has-device-addr-2.f90: New test.	2022-02-09 23:47:12 -08:00
Jakub Jelinek	0af7ef050a	libgomp: Fix segfault with posthumous orphan tasks [PR104385] The following patch fixes crashes with posthumous orphan tasks. When a parent task finishes, gomp_clear_parent clears the parent pointers of its children tasks present in the parent->children_queue. But children that are still waiting for dependencies aren't in that queue yet, they will be added there only when the sibling they are waiting for exits. Unfortunately we were adding those tasks into the queues with the original task->parent which then causes crashes because that task is gone and freed. The following patch fixes that by clearing the parent field when we schedule such task for running by adding it into the queues and we know that the sibling task which is about to finish has NULL parent. 2022-02-08 Jakub Jelinek <jakub@redhat.com> PR libgomp/104385 * task.c (gomp_task_run_post_handle_dependers): If parent is NULL, clear task->parent. * testsuite/libgomp.c/pr104385.c: New test.	2022-02-08 09:30:17 +01:00
Tobias Burnus	f62156eab7	libgomp.fortran/allocate-1.f90: Fix minor cleanup libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-1.f90: Remove spurious STOP of previous commit.	2022-02-04 17:31:21 +01:00
Tobias Burnus	6d49813501	libgomp.fortran/allocate-1.f90: Minor cleanup libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-1.c (is_64bit_aligned): Renamed from is_64bit_aligned_. * testsuite/libgomp.fortran/allocate-1.f90: Fix interface decl and use it, more implicit none, remove unused argument.	2022-02-04 14:51:01 +01:00
Tom de Vries	e0451f93d9	[nvptx] Add some support for .local atomics The ptx insn atom doesn't support local memory. In case of doing an atomic operation on local memory, we run into: ... operation not supported on global/shared address space ... This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE. The message is somewhat confusing given that actually the operation is not supported on local address space. Fix this by falling back on a non-atomic version when detecting a frame-related memory operand. This only solves some cases that are detected at compile-time. It does however fix the openacc private-atomic-* test-cases. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1") (define_insn "atomic_exchange<mode>") (define_insn "atomic_fetch_add<mode>") (define_insn "atomic_fetch_addsf") (define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version if memory operands is frame-relative. gcc/testsuite/ChangeLog: 2022-01-31 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/stack-atomics-run.c: New test. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove PR83812 workaround. * testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same. * testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.	2022-02-01 19:28:24 +01:00
Tom de Vries	d43fbc7d3f	[libgomp, testsuite] Fix insufficient resources in test-cases When running libgomp test-case broadcast-many.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... libgomp: The Nvidia accelerator has insufficient resources to launch \ 'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \ recompile the program with 'num_workers = x and vector_length = y' on \ that offloaded region or '-fopenacc-dim=❌y' where x * y <= 896. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O0 execution test ... The error does not occur when using GOMP_NVPTX_JIT=-O0. Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia. Likewise for some other test-cases. Tested libgomp on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce num_workers for nvidia accelerator to fix libgomp error 'insufficient resources'. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: Same. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.	2022-02-01 08:15:00 +01:00
Tom de Vries	be362d5e12	[libgomp, testsuite] Reduce recursion depth in declare_target-.f90 When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx accelerator (Nvidia T400, 2GB), I run into: ... libgomp: cuCtxSynchronize error: unspecified launch failure \ (perhaps abort was called) libgomp: cuMemFree_v2 error: unspecified launch failure libgomp: device finalization failed FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 execution test ... The test-case contains: ... ! Reduced from 25 to 23, otherwise execution runs out of thread stack on ! Nvidia Titan V. if (fib (23) /= fib_wrapper (23)) stop 2 ... Fix this by reducing the fib/fib_wrapper argument from 23 to 22. Same for declare_target-2.f90. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce recursion depth. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.	2022-02-01 08:13:06 +01:00
Thomas Schwinge	087e545747	Strengthen a few OpenACC test cases Rather than rubber-stamp whatever requested vs. actual device kernel launch configuration happens, actually (again) verify the requested values (modulo expected variations). This better highlights that "AMD GCN has an upper limit of 'num_workers(16)'", and the deficiency that "AMD GCN uses the autovectorizer for the vector dimension: the use of a function call in vector-partitioned code [...] is not currently supported". And, this removes several instances of race conditions, where variables are concurrently written to in OpenACC gang-redundant mode. libgomp/ * testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Strengthen. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.	2022-01-21 18:45:30 +01:00
Martin Liska	2aea19bdb1	nvptx: update fix for -Wformat-diag gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Update warning messages. libgomp/ChangeLog: * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update scanning patterns. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise. * testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>	2022-01-19 08:27:00 +01:00
Martin Liska	b1f3640912	nvptx: fix -Wformat-diag warnings gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Wrap keyword. * config/nvptx/nvptx.md: Remove trailing dot. libgomp/ChangeLog: * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update keyword in dg-warning. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise. * testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.	2022-01-18 17:25:36 +01:00

1 2 3 4 5 ...

1002 Commits