Commit Graph

1013 Commits

Author SHA1 Message Date
Jakub Jelinek
2c16eb3157 openmp: Add support for inoutset depend-kind
This patch adds support for inoutset depend-kind in depend
clauses.  It is very similar to the in depend-kind in that
a task with a dependency with that depend-kind is dependent
on all previously created sibling tasks with matching address
unless they have the same depend-kind.
In the in depend-kind case everything is dependent except
for in -> in dependency, for inoutset everything is
dependent except for inoutset -> inoutset dependency.
mutexinoutset is also similar (everything is dependent except
for mutexinoutset -> mutexinoutset dependency), but there is
also the additional restriction that only one task with
mutexinoutset for each address can be scheduled at once (i.e.
mutual exclusitivty).  For now we support mutexinoutset
the same as inout/out, but the inoutset support is full.

In order not to bump the ABI for dependencies each time
(we've bumped it already once, the old ABI supports only
inout/out and in depend-kind, the new ABI supports
inout/out, mutexinoutset, in and depobj), this patch arranges
for inoutset to be at least for the time being always handled
as if it was specified through depobj even when it is not.
So it uses the new ABI for that and inoutset are represented
like depobj - pointer to a pair of pointers where the first one
will be the actual address of the object mentioned in depend
clause and second pointer will be (void *) GOMP_DEPEND_INOUTSET.

2022-05-17  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum omp_clause_depend_kind): Add
	OMP_CLAUSE_DEPEND_INOUTSET.
	* tree-pretty-print.cc (dump_omp_clause): Handle
	OMP_CLAUSE_DEPEND_INOUTSET.
	* gimplify.cc (gimplify_omp_depend): Likewise.
	* omp-low.cc (lower_depend_clauses): Likewise.
gcc/c-family/
	* c-omp.cc (c_finish_omp_depobj): Handle
	OMP_CLAUSE_DEPEND_INOUTSET.
gcc/c/
	* c-parser.cc (c_parser_omp_clause_depend): Parse
	inoutset depend-kind.
	(c_parser_omp_depobj): Likewise.
gcc/cp/
	* parser.cc (cp_parser_omp_clause_depend): Parse
	inoutset depend-kind.
	(cp_parser_omp_depobj): Likewise.
	* cxx-pretty-print.cc (cxx_pretty_printer::statement): Handle
	OMP_CLAUSE_DEPEND_INOUTSET.
gcc/testsuite/
	* c-c++-common/gomp/all-memory-1.c (boo): Add test with
	inoutset depend-kind.
	* c-c++-common/gomp/all-memory-2.c (boo): Likewise.
	* c-c++-common/gomp/depobj-1.c (f1): Likewise.
	(f2): Adjusted expected diagnostics.
	* g++.dg/gomp/depobj-1.C (f4): Adjust expected diagnostics.
include/
	* gomp-constants.h (GOMP_DEPEND_INOUTSET): Define.
libgomp/
	* libgomp.h (struct gomp_task_depend_entry): Change is_in type
	from bool to unsigned char.
	* task.c (gomp_task_handle_depend): Handle GOMP_DEPEND_INOUTSET.
	Ignore dependencies where
	task->depend[i].is_in && task->depend[i].is_in == ent->is_in
	rather than just task->depend[i].is_in && ent->is_in.  Remember
	whether GOMP_DEPEND_IN loop is needed and guard the loop with that
	conditional.
	(gomp_task_maybe_wait_for_dependencies): Handle GOMP_DEPEND_INOUTSET.
	Ignore dependencies where elem.is_in && elem.is_in == ent->is_in
	rather than just elem.is_in && ent->is_in.
	* testsuite/libgomp.c-c++-common/depend-1.c (test): Add task with
	inoutset depend-kind.
	* testsuite/libgomp.c-c++-common/depend-2.c (test): Likewise.
	* testsuite/libgomp.c-c++-common/depend-3.c (test): Likewise.
	* testsuite/libgomp.c-c++-common/depend-inoutset-1.c: New test.
2022-05-17 15:40:27 +02:00
Tobias Burnus
4f94c38a92 OpenMP: Add omp_all_memory support to Fortran
Fortran part to the C/C++/backend implementation
r13-337-g7f78783dbedca0183d193e475262ca3c489fd365

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_namelist): Handle omp_all_memory.
	* openmp.cc (gfc_match_omp_variable_list, gfc_match_omp_depend_sink,
	gfc_match_omp_clauses, resolve_omp_clauses): Likewise.
	* trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj):
	Likewise.
	* resolve.cc (resolve_symbol): Reject it as symbol.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.1): Set omp_all_memory to 'Y'.
	* testsuite/libgomp.fortran/depend-5.f90: New test.
	* testsuite/libgomp.fortran/depend-6.f90: New test.
	* testsuite/libgomp.fortran/depend-7.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/all-memory-1.f90: New test.
	* gfortran.dg/gomp/all-memory-2.f90: New test.
	* gfortran.dg/gomp/all-memory-3.f90: New test.
2022-05-17 11:01:04 +02:00
Marcel Vollweiler
b4fb9f4f9a OpenMP, C++: Add template support for the has_device_addr clause.
This patch adds support for list items in the has_device_addr clause which type
is given by C++ template parameters.

gcc/cp/ChangeLog:

	* pt.cc (tsubst_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
	* semantics.cc (finish_omp_clauses): Added template decl processing.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/target-has-device-addr-7.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-8.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-9.C: New test.
2022-05-16 01:02:50 -07:00
Tobias Burnus
70d624ff06 libgomp.fortran/target-nowait-array-section.f90: Fix typo
Fix typo as requested in the review approval.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
2022-05-13 20:04:38 +02:00
Tobias Burnus
a46d626837 OpenMP/Fortran: Use firstprivat not alloc for ptr attach for arrays
For a non-descriptor array,  map(A(n:m)) was mapped as
  map(tofrom:A[n-1] [len: ...]) map(alloc:A [pointer assign, bias: ...])
with this patch, it is changed to
  map(tofrom:A[n-1] [len: ...]) map(firstprivate:A [pointer assign, bias: ...])

The latter avoids an alloc - and also avoids the race condition with
nowait in the enclosed testcase. (Note: predantically, the testcase is
invalid since OpenMP 5.1, violating the map clause restriction at [354:10-13].

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor
	array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of
	GOMP_MAP_POINTER for the pointer attachment.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
2022-05-13 20:00:34 +02:00
Thomas Schwinge
dcc266796a Refactor '-ldl' handling for libgomp proper and plugins
Instead of implicit global 'LIBS="-ldl $LIBS"' via 'AC_CHECK_LIB', make
'-ldl' explicit for libgomp proper, and clean up 'PLUGIN_GCN_LIBS',
'PLUGIN_NVPTX_LIBS' accordingly.

	libgomp/
	* Makefile.am (libgomp_la_LIBADD): Initialize.
	* plugin/configfrag.ac (DL_LIBS): New.
	(PLUGIN_GCN_LIBS): Remove.
	(PLUGIN_NVPTX_LIBS): Don't set in the 'PLUGIN_NVPTX_DYNAMIC' case.
	* plugin/Makefrag.am (libgomp_la_LIBADD)
	(libgomp_plugin_gcn_la_LIBADD): Consider '$(DL_LIBS)'.
	(libgomp_plugin_nvptx_la_LIBADD) <PLUGIN_NVPTX_DYNAMIC>: Likewise.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-05-12 15:11:30 +02:00
Thomas Schwinge
edbd2b1caa libgomp plugins: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED' for 'PLUGIN_GCN', 'PLUGIN_NVPTX'
Nothing ever used these.

	libgomp/
	* plugin/configfrag.ac: Don't 'AC_SUBST' and 'AC_DEFINE_UNQUOTED'
	for 'PLUGIN_GCN', 'PLUGIN_NVPTX'.
	* Makefile.in: Regenerate.
	* config.h.in: Likewise.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-05-12 13:28:23 +02:00
Jakub Jelinek
7f78783dbe openmp: Add omp_all_memory support (C/C++ only so far)
The ugly part is that OpenMP 5.1 made omp_all_memory a reserved identifier
which isn't allowed to be used anywhere but in the depend clause, this is
against how everything else has been handled in OpenMP so far (where
some identifiers could have special meaning in some OpenMP clauses or
pragmas but not elsewhere).
The patch handles it by making it a conditional keyword (for -fopenmp
only) and emitting a better diagnostics when it is used in a primary
expression.  Having a nicer diagnostics when e.g. trying to do
int omp_all_memory;
or
int *omp_all_memory[10];
etc. would mean changing too many spots and hooking into name lookups
to reject declaring any such symbols would be too ugly and I'm afraid
there are way too many spots where one can introduce a name
(variables, functions, namespaces, struct, enum, enumerators, template
arguments, ...).

Otherwise, the handling is quite simple, normal depend clauses lower
into addresses of variables being handed over to the library, for
omp_all_memory I'm using NULL pointers.  omp_all_memory can only be
used with inout or out depend kinds and means that a task is dependent
on all previously created sibling tasks that have any dependency (of
any depend kind) and that any later created sibling tasks will be
dependent on it if they have any dependency.

2022-05-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* gimplify.cc (gimplify_omp_depend): Don't build_fold_addr_expr
	if null_pointer_node.
	(gimplify_scan_omp_clauses): Likewise.
	* tree-pretty-print.cc (dump_omp_clause): Print null_pointer_node
	as omp_all_memory.
gcc/c-family/
	* c-common.h (enum rid): Add RID_OMP_ALL_MEMORY.
	* c-omp.cc (c_finish_omp_depobj): Don't build_fold_addr_expr
	if null_pointer_node.
gcc/c/
	* c-parser.cc (c_parse_init): Register omp_all_memory as keyword
	if flag_openmp.
	(c_parser_postfix_expression): Diagnose uses of omp_all_memory
	in postfix expressions.
	(c_parser_omp_variable_list): Handle omp_all_memory in depend
	clause.
	* c-typeck.cc (c_finish_omp_clauses): Handle omp_all_memory
	keyword in depend clause as null_pointer_node, diagnose invalid
	uses.
gcc/cp/
	* lex.cc (init_reswords): Register omp_all_memory as keyword
	if flag_openmp.
	* parser.cc (cp_parser_primary_expression): Diagnose uses of
	omp_all_memory in postfix expressions.
	(cp_parser_omp_var_list_no_open): Handle omp_all_memory in depend
	clause.
	* semantics.cc (finish_omp_clauses): Handle omp_all_memory
	keyword in depend clause as null_pointer_node, diagnose invalid
	uses.
	* pt.cc (tsubst_omp_clause_decl): Pass through omp_all_memory.
gcc/testsuite/
	* c-c++-common/gomp/all-memory-1.c: New test.
	* c-c++-common/gomp/all-memory-2.c: New test.
	* c-c++-common/gomp/all-memory-3.c: New test.
	* g++.dg/gomp/all-memory-1.C: New test.
	* g++.dg/gomp/all-memory-2.C: New test.
libgomp/
	* libgomp.h (struct gomp_task): Add depend_all_memory member.
	* task.c (gomp_init_task): Initialize depend_all_memory.
	(gomp_task_handle_depend): Handle omp_all_memory.
	(gomp_task_run_post_handle_depend_hash): Clear
	parent->depend_all_memory if equal to current task.
	(gomp_task_maybe_wait_for_dependencies): Handle omp_all_memory.
	* testsuite/libgomp.c-c++-common/depend-1.c: New test.
	* testsuite/libgomp.c-c++-common/depend-2.c: New test.
	* testsuite/libgomp.c-c++-common/depend-3.c: New test.
2022-05-12 08:31:20 +02:00
Thomas Schwinge
876ac21b7e libgomp: Remove unused '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib'
With recent commit 2e309a4eff "libgomp testsuite:
Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library",
and commit d6adba3075 "libgomp GCN plugin:
 Clean up unused references to system-provided HSA Runtime library", the last
uses of '--with-hsa-runtime' etc. are gone.

	gcc/
	* doc/install.texi: Don't document '--with-hsa-runtime',
	'--with-hsa-runtime-include', '--with-hsa-runtime-lib'.
	libgomp/
	* plugin/configfrag.ac: Remove '--with-hsa-runtime',
	'--with-hsa-runtime-include', '--with-hsa-runtime-lib' processing.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-05-11 14:27:42 +02:00
Thomas Schwinge
91a6dcd149 libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS'
After recent commit d6adba3075
"libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime
library", these aren't set anymore.

	libgomp/
	* plugin/Makefrag.am (libgomp_plugin_gcn_la_CPPFLAGS): Don't
	consider 'PLUGIN_GCN_CPPFLAGS'.
	(libgomp_plugin_gcn_la_LDFLAGS): Don't consider
	'PLUGIN_GCN_LDFLAGS'.
	* plugin/configfrag.ac (PLUGIN_GCN_CPPFLAGS, PLUGIN_GCN_LDFLAGS):
	Remove.
	* Makefile.in: Regenerate.
	* configure: Likewise.
	* testsuite/Makefile.in: Likewise.
2022-05-11 14:25:58 +02:00
Thomas Schwinge
2e309a4eff libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library
This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can
tell.

'libgomp/testsuite/lib/libgomp.exp:libgomp_init' states:

    # For build-tree testing, also consider the library paths used for builing.
    # For installed testing, we assume all that to be provided in the sysroot.
    if { $blddir != "" } {
        [...]
        global hsa_runtime_lib
        if { $hsa_runtime_lib != "" } {
            append always_ld_library_path ":$hsa_runtime_lib"
        }
    }

However, the libgomp GCN plugin is unconditionally built against the
GCC-shipped 'include/hsa*.h' header files, and at run time does
'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
library "used for builing".  It thus doesn't make sense to amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library.

	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_init): Don't
	'append always_ld_library_path ":$hsa_runtime_lib"'.
	* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib): Don't set.
2022-05-11 14:24:21 +02:00
Thomas Schwinge
7981524755 Fix up 'libgomp.fortran/use_device_addr-5.f90' multi-device testing
Fix-up for recent commit r13-116-g3f8c389fe90bf565a6221a46bb7fb745dd4c1510
"OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg", where we
currently get:

    libgomp: use_device_ptr pointer wasn't mapped
    FAIL: libgomp.fortran/use_device_addr-5.f90   -O  execution test

	libgomp/
	* testsuite/libgomp.fortran/use_device_addr-5.f90: Fix up
	multi-device testing.
2022-05-10 14:48:11 +02:00
Marcel Vollweiler
4043f53cb4 OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.
gcc/ChangeLog:

	* omp-low.cc (omp_runtime_api_call): Added target_is_accessible to
	omp_runtime_apis array.

libgomp/ChangeLog:

	* libgomp.map: Added omp_target_is_accessible.
	* libgomp.texi: Tagged omp_target_is_accessible as supported.
	* omp.h.in: Added omp_target_is_accessible.
	* omp_lib.f90.in: Added interface for omp_target_is_accessible.
	* omp_lib.h.in: Likewise.
	* target.c (omp_target_is_accessible): Added implementation of
	omp_target_is_accessible.
	* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: New test.
	* testsuite/libgomp.fortran/target-is-accessible-1.f90: New test.
2022-05-06 07:28:26 -07:00
Tobias Burnus
3f8c389fe9 OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg
For array-descriptor vars, the descriptor is assigned to a temporary. However,
this failed when the clause's argument was in turn in a data-sharing clause
as the outer context's VALUE_EXPR wasn't used.

gcc/ChangeLog:

	* omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list
	item that is in an outer data-sharing clause.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/use_device_addr-5.f90: New test.
2022-05-04 18:18:44 +02:00
Marcel Vollweiler
941cdc8b6d OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.
This patch adds the OpenMP runtime routine "omp_get_mapped_ptr" which was
introduced in OpenMP 5.1.

gcc/ChangeLog:

	* omp-low.cc (omp_runtime_api_call): Added get_mapped_ptr to
	omp_runtime_apis array.

libgomp/ChangeLog:

	* libgomp.map: Added omp_get_mapped_ptr.
	* libgomp.texi: Tagged omp_get_mapped_ptr as supported.
	* omp.h.in: Added omp_get_mapped_ptr.
	* omp_lib.f90.in: Added interface for omp_get_mapped_ptr.
	* omp_lib.h.in: Likewise.
	* target.c (omp_get_mapped_ptr): Added implementation of
	omp_get_mapped_ptr.
	* testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test.
	* testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test.
	* testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test.
	* testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test.
	* testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test.
	* testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test.
	* testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test.
	* testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.
2022-05-02 23:56:44 -07:00
Thomas Schwinge
2a570f11a2 Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717]
That got broken by recent commit b220243191
"fortran: Fix up gfc_trans_oacc_construct [PR104717]".

	PR fortran/104717
	libgomp/
	* testsuite/libgomp.oacc-fortran/print-1.f90: Add OpenACC
	privatization scanning.  For GCN offloading compilation, raise
	'-mgang-private-size'.
2022-04-28 15:15:29 +02:00
Jakub Jelinek
b220243191 fortran: Fix up gfc_trans_oacc_construct [PR104717]
So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs
for which that function is invoked need an extra artificial BIND_EXPR
around their body so that we move all variables of the bodies.

The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
or OMP_TARGET and for OpenACC constructs that behave similarly to
OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.

The following patch does that for OpenACC constructs too.

	PR fortran/104717
	gcc/fortran/
	* trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body
	in an extra BIND_EXPR.
	gcc/testsuite/
	* gfortran.dg/goacc/pr104717.f90: New test.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-04-25 23:14:02 +02:00
Chung-Lin Tang
b0af8e3a50 OpenMP: Fix nested use_device_ptr
This patch fixes a bug in lower_omp_target, where for Fortran arrays,
the expanded sender assignment is wrongly using the variable in the
current ctx, instead of the one looked-up outside, which is causing
use_device_ptr/addr to fail to work when used inside an omp-parallel
(where the omp child_fn is split away from the original).

The fix is inside omp-low.cc, though because the omp_array_data langhook
is used only by Fortran, this is essentially Fortran-specific.

2022-04-05  Chung-Lin Tang  <cltang@codesourcery.com>

gcc/ChangeLog:

	* omp-low.cc (lower_omp_target): Use outer context looked-up 'var' as
	argument to lang_hooks.decls.omp_array_data, instead of 'ovar' from
	current clause.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/use_device_ptr-4.f90: New testcase.
2022-04-05 08:31:34 -07:00
Tom de Vries
88cffa1a07 [libgomp/testsuite] Fix libgomp.fortran/examples-4/declare_target-{1,2}.f90
The test-cases libgomp.fortran/examples-4/declare_target-{1,2}.f90 mean to
set an nvptx-specific limit using offload_target_nvptx, but also change
behaviour for amd.

That is, there is now a difference in behaviour between:
- a compiler configured for GCN offloading, and
- a compiler configured for both GCN and nvptx offloading.

Fix this by using instead on_device_arch_nvptx.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-04-04  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Use
	on_device_arch_nvptx instead of offload_target_nvptx.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
2022-04-04 13:37:19 +02:00
Tom de Vries
bfa9f660d2 [libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90
When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on
an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run
into:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \
  -DGOMP_NVPTX_JIT=-O0 execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \
  -DGOMP_NVPTX_JIT=-O0 execution test
...

Fix this by further limiting recursion depth in the test-cases for nvptx.

Furthermore, make the recursion depth limiting nvptx-specific.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-04-01  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Define
	and use REC_DEPTH.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
2022-04-01 13:23:16 +02:00
Tom de Vries
065e25f633 [libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.c
When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an
RTX A2000 (sm_86) with driver 510.60.02 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  \
  output pattern test
...

The failing check verifies the launch dimensions:
...
/* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: \
                launch gangs=1, workers=8, vectors=128" } */
...
which fails because (as we can see with GOMP_DEBUG=1) the actual num_workers
is 6:
...
  nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128
...

This is due to the result of cuOccupancyMaxPotentialBlockSize (which suggests
'a launch configuration with reasonable occupancy') printed just before:
...
cuOccupancyMaxPotentialBlockSize: grid = 52, block = 768
...
[ Note: 6 * 128 == 768. ]

Fix this by updating the check to allow num_workers in the range 1 to 8.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-04-01  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Fix
	num_workers check.
2022-04-01 13:22:07 +02:00
Tom de Vries
8570cce7c7 [libgomp, testsuite] Scale down some OpenACC test-cases
When a display manager is running on an nvidia card, all CUDA kernel launches
get a 5 seconds watchdog timer.

Consequently, when running the libgomp testsuite with nvptx accelerator and
GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
  execution test
...

Fix this by scaling down the failing test-cases by default, and reverting to
the original behaviour for GCC_TEST_RUN_EXPENSIVE=1.

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-03-25  Tom de Vries  <tdevries@suse.de>

	PR libgomp/105042
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce
	execution time.
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.
2022-03-25 13:51:48 +01:00
Tobias Burnus
1002a7ace1 LTO: Fixes for renaming issues with offload/OpenMP [PR104285]
gcc/lto/ChangeLog:

	PR middle-end/104285
	* lto-partition.cc (maybe_rewrite_identifier): Use get_identifier
	for the returned string to be usable as hash key.
	(validize_symbol_for_target): Hence, use return value directly.
	(privatize_symbol_name_1): Track maybe_rewrite_identifier renames.
	* lto.cc (offload_handle_link_vars): Move function up before ...
	(do_whole_program_analysis): Call it after static renamings.
	(lto_main): Move call after static renamings.

libgomp/ChangeLog:

	PR middle-end/104285
	* testsuite/libgomp.c++/target-same-name-2-a.C: New test.
	* testsuite/libgomp.c++/target-same-name-2-b.C: New test.
	* testsuite/libgomp.c++/target-same-name-2.C: New test.
	* testsuite/libgomp.c-c++-common/target-same-name-1-a.c: New test.
	* testsuite/libgomp.c-c++-common/target-same-name-1-b.c: New test.
	* testsuite/libgomp.c-c++-common/target-same-name-1.c: New test.
2022-03-23 09:44:39 +01:00
Tom de Vries
a624388b95 [nvptx] Add warp sync at simt exit
Consider this code (with N defined to 1024):
...
  float v = 0.0;
  #pragma omp target map(tofrom: v)
  #pragma omp parallel for simd
  for (int i = 0 ; i < N; i++)
    {
      #pragma omp atomic update
      v = v + 1.0;
    }
...

It hangs when executing on target board unix/-foffload=-misa=sm_75, using
drivers 470.103.01 and 510.54 on a T400 board (sm_75).

I'm tentatively identifying the problem as a bug in -muniform-simt for
architectures that support Independent Thread Scheduling (sm_70 and later).

The problem -muniform-simt is trying to address is to make sure that a
register produced outside an openmp simd region is available when used in any
lane inside an simd region.

The solution is to, outside an simd region, execute in all warp lanes, thus
producing consistent values in result registers in each warp thread.

This approach doesn't work when executing in all warp lanes multiplies the
side effects from 1 to 32 separate side effects, which is the case for atomic
insns.  So atomic insns are rewritten to execute only in lane 0, and if
there are any results, those are propagated to the other threads in the warp.
[ And likewise for system calls malloc, free, vprintf. ]

Now, consider a non-atomic update: ld, add, store.  The store has side
effects, are those multiplied or not?

Pre-sm_70 we can assume that at the end of an SIMT region, any divergent
control flow has reconverged, and we have a uniform warp, executing in lock
step.  So:
- the load will load the same value into the result register across the warp,
- the add will write the same value into the result register across the warp,
- the store will write the same value to the same memory location, 32 times,
  at once, having the result of a single store.
So, no side-effect multiplication (well, at least that's the observation).

Starting sm_70, the threads in a warp are no longer guaranteed to reconverge
after divergence.  There's a "Convergence Optimizer" that can can identify
that it is safe for a warp to reconverge, but that works only as long as the
code does not contain "synchronizing operations".

Consequently, the ld, add, store sequence can be executed by a non-uniform
warp, which means the side effects can have multiplied, and the registers are
no longer guarantueed to be in sync.

The atomic update in the example above is translated using an atom.cas loop,
which means that we have divergence (because only one thread is allowed to
succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably
because the atom.cas counts as a "synchronizing operation".  So, it seems
plausible that the root cause for the mentioned hang is the problem described
above.

Fix this by adding an explicit warp sync at simt exit.

Note that we're assuming here that the warp will stay uniform until the next
SIMT region entry.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-09  Tom de Vries  <tdevries@suse.de>

	PR target/104916
	PR target/104783
	* config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp
	sync (or uniform warp check for mptx < 6.0).

libgomp/ChangeLog:

2022-03-15  Tom de Vries  <tdevries@suse.de>

	PR target/104916
	PR target/104783
	* testsuite/libgomp.c/pr104783-2.c: New test.
2022-03-22 14:35:34 +01:00
Tobias Burnus
c133bdfa9e Fortran/OpenMP: Fix privatization of associated names
gfc_omp_predetermined_sharing cases the associate-name pointer variable
to be OMP_CLAUSE_DEFAULT_FIRSTPRIVATE, which is fine. However, the associated
selector is shared. Thus, the target of associate-name pointer should not get
copied. (It was before but because of gfc_omp_privatize_by_reference returning
false, the selector was not only wrongly copied but this was also not done
properly.)

gcc/fortran/ChangeLog:

	PR fortran/103039
	* trans-openmp.cc (gfc_omp_clause_copy_ctor, gfc_omp_clause_dtor):
	Only privatize pointer for associate names.

libgomp/ChangeLog:

	PR fortran/103039
	* testsuite/libgomp.fortran/associate4.f90: New test.
2022-03-18 17:40:22 +01:00
Tom de Vries
093cdadbce [openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPR
Consider test-case pr104952-1.c, included in this commit, containing:
...
  #pragma omp target map(tofrom:result) map(to:arr)
  #pragma omp simd reduction(||: result)
...

When run on x86_64 with nvptx accelerator, the test-case either aborts or
hangs.

The reduction clause is translated by the SIMT code (active for nvptx) as a
butterfly reduction loop with this butterfly shuffle / update pair:
...
  D.2163 = D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
...
in the loop body.

The problem is that the butterfly shuffle is possibly not executed, while it
needs to be executed unconditionally.

Fix this by translating instead as:
...
  D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164)
  D.2163 = D.2163 || D.tmp_bfly
...

Tested on x86_64-linux with nvptx accelerator.

gcc/ChangeLog:

2022-03-17  Tom de Vries  <tdevries@suse.de>

	PR target/104952
	* omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY
	is executed unconditionally.

libgomp/ChangeLog:

2022-03-17  Tom de Vries  <tdevries@suse.de>

	PR target/104952
	* testsuite/libgomp.c/pr104952-1.c: New test.
	* testsuite/libgomp.c/pr104952-2.c: New test.
2022-03-18 15:45:13 +01:00
Thomas Schwinge
c43cb355f2 Enhance further testcases to verify Openacc 'kernels' decomposition
gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
	* c-c++-common/goacc/nesting-1.c: Likewise.
	* gcc.dg/goacc/nested-function-1.c: Likewise.
	* gfortran.dg/goacc/common-block-3.f90: Likewise.
	* gfortran.dg/goacc/nested-function-1.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
2022-03-17 08:51:32 +01:00
Thomas Schwinge
004fc4f2fc Enhance further testcases to verify handling of OpenACC privatization level [PR90115]
As originally introduced in commit 11b8286a83
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".

	PR middle-end/90115
	gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
	* gfortran.dg/goacc/common-block-3.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
2022-03-17 08:47:09 +01:00
Marcel Vollweiler
be093b8dcc OpenMP, Fortran: Bugfix for omp_set_num_teams.
This patch fixes a small bug in the omp_set_num_teams implementation.

libgomp/ChangeLog:

	* fortran.c (omp_set_num_teams_8_): Call omp_set_num_teams instead of
	omp_set_max_active_levels.
	* testsuite/libgomp.fortran/icv-8.f90: New test.
2022-03-16 07:38:54 -07:00
Thomas Schwinge
ab46fc7c3b OpenACC privatization diagnostics vs. 'assert' [PR102841]
It's an orthogonal concern why these diagnostics do appear at all for
non-offloaded OpenACC constructs (where they're not relevant at all); PR90115.

Depending on how 'assert' is implemented, it may cause temporaries to be
created, and/or may lower into 'COND_EXPR's, and
'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'.

Fix-up for commit 11b8286a83
"[OpenACC privatization] Largely extend diagnostics and
corresponding testsuite coverage [PR90115]".

	PR testsuite/102841
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust.
2022-03-16 10:12:09 +01:00
Thomas Schwinge
a07b8f4fb7 OpenACC 'kernels' decomposition: resolve wrong-code cases unless manually making certain variables addressable [PR100280, PR104892]
Currently in OpenACC 'kernels' decomposition, there is special handling of
'GOMP_MAP_FORCE_TOFROM', documented to be done to avoid "internal compiler
errors in later passes".  For performance reasons, the current repetitive
to/from device copying for every region is not ideal, compared to using
'present' clauses, as done for almost all other 'GOMP_MAP_*'.  Also, the
current special handling (incomplete, evidently) is the reason for the PR104892
misbehavior.  For PR100280 etc. we've resolved all such known ICEs -- removing
the special handling for 'GOMP_MAP_FORCE_TOFROM' now resolves PR104892.

	PR middle-end/100280
	PR middle-end/104892
	gcc/
	* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
	Remove special handling of 'GOMP_MAP_FORCE_TOFROM'.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
	Likewise.
2022-03-12 15:37:27 +01:00
Thomas Schwinge
535afbd959 OpenACC 'kernels' decomposition: wrong-code cases unless manually making certain variables addressable [PR104892]
Document a few examples of the status quo.

	PR middle-end/104892
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Point
	to PR104892.
	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise,
	enable '--param=openacc-kernels=decompose' and adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
	Likewise.
2022-03-12 15:37:27 +01:00
Thomas Schwinge
2e53fa7bb2 Enhance further testcases to verify handling of OpenACC privatization level [PR90115]
As originally introduced in commit 11b8286a83
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".

	PR middle-end/90115
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/default-1.c: Enhance.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.
2022-03-12 14:00:46 +01:00
Thomas Schwinge
337ed336d7 OpenACC 'kernels' decomposition: Mark variables used in 'present' clauses as addressable [PR100280, PR104086]
... like in recent commit 9b32c1669a
"OpenACC 'kernels' decomposition: Mark variables used in synthesized
data clauses as addressable [PR100280]".  Otherwise, we may run into
'gcc/omp-low.cc:lower_omp_target':

    13125                       else if (is_gimple_reg (var))
    13126                         {
    13127                           gcc_assert (offloaded);

	PR middle-end/100280
	PR middle-end/104086
	gcc/
	* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
	Mark variables used in 'present' clauses as addressable.
	* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Gracefully
	handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust,
	extend.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Merge this...
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	..., and this...
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into
	this, and adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Extend.
2022-03-12 13:02:55 +01:00
Hafiz Abid Qadeer
7c2ac3cebd Fix multiple issue in the testcase allocate-1.f90.
1. Thomas reported in
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589039.html
that this testcase is randomly failing. The problem was fixed pool
size which was exhausted when there were a lot of threads. Fixed it
by removing pool_size trait which causes default pool size to be used
which should be big enough.

2. Array indices have been changed to check the last element in the
array.

3. Remove a redundant assignment and move some code to better match
C testcase.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/allocate-1.f90: Remove pool_size
	trait.  Test last index in w and v array.  Remove redundant
	assignment to V(1).  Move alignment checks at the end of
	parallel region.
2022-03-10 18:43:50 +00:00
Tom de Vries
f07178ca3c [nvptx] Disable warp sync in simt region
I ran into a hang for this code:
...
  #pragma omp target map(tofrom: counter_N0)
  #pragma omp simd
  for (int i = 0 ; i < 1 ; i++ )
    {
      #pragma omp atomic update
      counter_N0 = counter_N0 + 1 ;
    }
...

This has to do with the nature of -muniform-simt.  It has two modes of
operation: inside and outside an SIMT region.

Outside an SIMT region, a warp pretends to execute a single thread, but
actually executes in all threads, to keep the local registers in all threads
consistent.  This approach works unless the insn that is executed is a syscall
or an atomic insn.  In that case, the insn is predicated, such that it
executes in only one thread.  If the predicated insn writes a result to a
register, then that register is propagated to the other threads, after which
the local registers in all threads are consistent again.

Inside an SIMT region, a warp executes in all threads.  However, the
predication and propagation for syscalls and atomic insns is also present
here, because nvptx_reorg_uniform_simt works on all code.  Care has been taken
though to ensure that the predication and propagation is a nop.  That is,
inside an SIMT region:
- the predicate evalutes to true for each thread, and
- the propagation insn copies a register from each thread to the same thread.

That works fine, until we use -mptx=6.0, and instead of using the deprecated
warp propagation insn shfl, we start using shfl.sync:
...
  @%r33 atom.add.u32		_, [%r29], 1;
	shfl.sync.idx.b32	%r30, %r30, %r32, 31, 0xffffffff;
...

The shfl.sync specifies a member mask indicating all threads, but given that
the loop only has a single iteration, only thread 0 will execute the insn,
where it will hang waiting for the other threads.

Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
uniform warp check) such that it only executes outside the SIMT region.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-03-08  Tom de Vries  <tdevries@suse.de>

	PR target/104783
	* config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
	(nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate.
	(nvptx_get_unisimt_outside_simt_predicate): New function.
	(predicate_insn): New function, factored out of ...
	(nvptx_reorg_uniform_simt): ... here.  Predicate all emitted insns.
	* config/nvptx/nvptx.h (struct machine_function): Add
	unisimt_outside_simt_predicate field.
	* config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
	(define_insn "nvptx_uniform_warp_check"): Make predicable.

libgomp/ChangeLog:

2022-03-10  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/pr104783.c: New test.
2022-03-10 12:20:44 +01:00
Thomas Schwinge
7a5e036b61 [OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, PR102330, PR104774]
... so that it matches what we analyze and what we action on.
Fix-up for commit 29a2f51806 "openacc:
Add support for gang local storage allocation in shared memory [PR90115]".

	PR middle-end/90115
	PR middle-end/102330
	PR middle-end/104774
	gcc/
	* omp-low.cc (oacc_privatization_candidate_p)
	(oacc_privatization_scan_clause_chain)
	(oacc_privatization_scan_decl_chain, lower_oacc_private_marker):
	Analyze 'lookup_decl'-translated DECL.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
	* c-c++-common/goacc/privatization-1-compute-loop.c: Likewise.
	* c-c++-common/goacc/privatization-1-compute.c: Likewise.
	* c-c++-common/goacc/privatization-1-routine_gang-loop.c:
	Likewise.
	* c-c++-common/goacc/privatization-1-routine_gang.c: Likewise.
	* gfortran.dg/goacc-gomp/pr102330-1.f90: Likewise, and subsume...
	* gfortran.dg/goacc-gomp/pr102330-2.f90: ... this file, and...
	* gfortran.dg/goacc-gomp/pr102330-3.f90: ... this file.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
	Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
2022-03-10 12:06:28 +01:00
Thomas Schwinge
1d9dc3dd74 Enhance further testcases to verify handling of OpenACC privatization level [PR90115]
As originally introduced in commit 11b8286a83
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".

	PR middle-end/90115
	gcc/testsuite/
	* c-c++-common/goacc/nesting-1.c: Enhance.
	* gcc.dg/goacc/nested-function-1.c: Likewise.
	* gcc.dg/goacc/nested-function-2.c: Likewise.
	* gfortran.dg/goacc/nested-function-1.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-fortran/routine-1.f90: Enhance.
	* testsuite/libgomp.oacc-fortran/routine-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-9.f90: Likewise.
2022-03-10 11:24:07 +01:00
Thomas Schwinge
14dfbb5359 Fix 'libgomp.oacc-c-c++-common/kernels-decompose-1.c' expected diagnostics
Fix-up for recent commit 8935589b49
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]": adjust for a GCN offloading workaround
added just before commit: '(volatile void *) &f1;'.

	PR testsuite/104791
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Fix
	expected diagnostics.
2022-03-04 20:42:29 +01:00
Thomas Schwinge
e28eb86c18 Test 'libgomp.oacc-*/kernels-private-vars-*' with '--param=openacc-kernels=decompose' [PR104784]
Before recent commit 8935589b49
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]", 'libgomp.oacc-c' testing already worked fine,
but 'libgomp.oacc-c++' testing ICEed.  Via the commit mentioned, the C++
testing ICEs are now resolved, but the underlying issue remains to be looked
into: PR104784 "OpenACC 'kernels' decomposition: C vs. C++ differences".

	PR middle-end/104784
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
	Test with '--param=openacc-kernels=decompose'.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
	Likewise.
2022-03-04 15:47:06 +01:00
Thomas Schwinge
07395f19df Test '-fopt-info-omp-all' in 'libgomp.oacc-*/kernels-private-vars-*'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
	Test '-fopt-info-omp-all'.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
	Likewise.
2022-03-04 14:47:19 +01:00
Thomas Schwinge
8935589b49 OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]
... by generalizing the existing 'gcc/omp-low.cc:task_shared_vars'.

Fix-up for commit 9b32c1669a
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	PR middle-end/104132
	PR middle-end/104133
	gcc/
	* omp-low.cc (task_shared_vars): Rename to
	'make_addressable_vars'.  Adjust all users.
	(scan_sharing_clauses) <OMP_CLAUSE_MAP> Use it for
	'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs, too.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Extend.
2022-03-04 14:21:01 +01:00
Thomas Schwinge
de6e81ea96 OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into OMP lowering [PR100280]
... in preparation for later changes.  No functional change.

Follow-up to commit 9b32c1669a
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	gcc/
	* tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New.
	* tree-core.h: Document it.
	* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Handle
	'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of
	'TREE_ADDRESSABLE'.
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
2022-03-04 14:21:01 +01:00
Thomas Schwinge
e5ae22c561 Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable" [PR100280]
Follow-up to commit 9b32c1669a
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	gcc/
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Add diagnostic: "note: OpenACC 'kernels' decomposition: variable
	'[...]' declared in block made addressable".
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
	'--param=openacc-privatization=noisy'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
2022-03-04 14:21:00 +01:00
Tom de Vries
f485b0ed7d [libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm*.c
When running with target board unix/-foffload=-mptx=3.1, we run into:
...
lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \
  selected -misa (sm_53)^M
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \
  1 exit status^M
compilation terminated.^M
  ...
FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors)
...

Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm*.c
test-cases.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-28  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: Same.
2022-02-28 10:10:51 +01:00
Tom de Vries
59b8ade887 [libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c
Add openmp test-cases that test the omp declare variant construct:
...
  #pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.

Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do
link.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-24  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/declare-variant-3-sm30.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: New test.
	* testsuite/libgomp.c/declare-variant-3.h: New header file.
2022-02-24 11:41:03 +01:00
Thomas Schwinge
f8187b5c0d Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90'
This was a latent problem, and this commit here now resolves a regression that
after recent commit a78b1ab1df
"amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a
GCN offloading '-march=gfx908' system:

    {+WARNING: program timed out.+}
    [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test

Same for other optimization levels.

Make sure that we're not executing non-parallelized code in gang-redundant
mode, by putting these parts into their own 'parallel' constructs, which then
default to 'num_gangs(1)'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC
	gang-redundant execution.
2022-02-22 17:32:03 +01:00
Tom de Vries
5ed77fb3ed [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
    ;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  <tdevries@suse.de>

	PR target/99555
	* config/nvptx/bar.c (generation_to_barrier): New function, copied
	from config/rtems/bar.c.
	(futex_wait, futex_wake): New function.
	(do_spin, do_wait): New function, copied from config/linux/wait.h.
	(gomp_barrier_wait_end, gomp_barrier_wait_last)
	(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
	(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
	and replace with include of config/linux/bar.c.
	* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
	(gomp_barrier_init): Init new fields.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
	workarounds.
	* testsuite/libgomp.c/pr99555-1.c: Same.
	* testsuite/libgomp.fortran/task-detach-6.f90: Same.
2022-02-22 15:48:03 +01:00
Tom de Vries
6263b656c8 [libgomp, testsuite, nvptx] Fix pr96390.c without CUDA
When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...

The problem is that we're expecting the following ptxas error:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
ptxas /tmp/ccZYDw8N.o, line 90; error   : Call to 'baz' requires call prototype
ptxas /tmp/ccZYDw8N.o, line 90; error   : Unknown symbol 'baz'
...

But it's not triggered because ptxas is not in the path, so nvptx-none-as
defaults to --no-verify.

So instead, we run into the same error at execution time.

Fix this by forcing verification using:
...
/* { dg-additional-options "-foffload=-Wa,--verify" \
     { target offload_target_nvptx } } */
...
such that we run into the xfail in this way instead:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory
nvptx-as: ptxas returned 255 exit status
...

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-02-21  Tom de Vries  <tdevries@suse.de>

	PR testsuite/104146
	* testsuite/libgomp.c++/pr96390.C: Add additional-option
	-foffload=-Wa,--verify for nvptx.
	* testsuite/libgomp.c-c++-common/pr96390.c: Same.
2022-02-22 10:23:20 +01:00
Tobias Burnus
3939c1b112 Fortran/OpenMP: Fix depend-clause handling
gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj):
	Depend on the proper addr, for ptr/alloc depend on pointee.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/depend-4.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/depend-4.f90: New test.
	* gfortran.dg/gomp/depend-5.f90: New test.
2022-02-15 12:26:48 +01:00