This patch adds two new OpenMP runtime routines: omp_target_memcpy_async and
omp_target_memcpy_rect_async. Both functions are introduced in OpenMP 5.1 as
asynchronous variants of omp_target_memcpy and omp_target_memcpy_rect.
In contrast to the synchronous variants, the asynchronous functions have two
additional function parameters to allow the specification of task dependences:
int depobj_count
omp_depend_t *depobj_list
integer(c_int), value :: depobj_count
integer(omp_depend_kind), optional :: depobj_list(*)
The implementation splits the synchronous functions into two parts: (a) check
and (b) copy. Then (a) is used in the asynchronous functions for the sequential
part, and the actual copy process (b) is executed in a new created task. The
sequential part (a) takes into account the requirements for the return values:
"The routine returns zero if successful. Otherwise, it returns a non-zero
value." (omp_target_memcpy_async, OpenMP 5.1 spec, section 3.8.7)
"An application can determine the number of inclusive dimensions supported by an
implementation by passing NULL pointers (or C_NULL_PTR, for Fortran) for both
dst and src. The routine returns the number of dimensions supported by the
implementation for the specified device numbers. No copy operation is
performed." (omp_target_memcpy_rect_async, OpenMP 5.1 spec, section 3.8.8)
Due to asynchronicity an error is thrown if the asynchronous memcpy is not
successful (in contrast to the synchronous functions which use a return
value unequal to zero).
gcc/ChangeLog:
* omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and
target_memcpy_rect_async to omp_runtime_apis array.
libgomp/ChangeLog:
* libgomp.map: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* libgomp.texi: Both functions are now supported.
* omp.h.in: Added omp_target_memcpy_async and
omp_target_memcpy_rect_async.
* omp_lib.f90.in: Added interfaces for both new functions.
* omp_lib.h.in: Likewise.
* target.c (ialias_redirect): Added for GOMP_task.
(omp_target_memcpy): Restructured into check and copy part.
(omp_target_memcpy_check): New helper function for omp_target_memcpy and
omp_target_memcpy_async that checks requirements.
(omp_target_memcpy_copy): New helper function for omp_target_memcpy and
omp_target_memcpy_async that performs the memcpy.
(omp_target_memcpy_async_helper): New helper function that is used in
omp_target_memcpy_async for the asynchronous task.
(omp_target_memcpy_async): Added.
(omp_target_memcpy_rect): Restructured into check and copy part.
(omp_target_memcpy_rect_check): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks
requirements.
(omp_target_memcpy_rect_copy): New helper function for
omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs
the memcpy.
(omp_target_memcpy_rect_async_helper): New helper function that is used
in omp_target_memcpy_rect_async for the asynchronous task.
(omp_target_memcpy_rect_async): Added.
* task.c (ialias): Added for GOMP_task.
* testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test.
* testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test.
* testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test.
* testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
OpenMP 5.2 added
"When called from within a target region the effect is unspecified."
restriction to omp_display_env, so it is ok not to support it in
target regions (worst case we could add an empty implementation
or one with __builtin_trap in there).
2022-05-17 Jakub Jelinek <jakub@redhat.com>
* libgomp.texi (OpenMP 5.1): Remove "Not inside target regions"
comment for omp_display_env feature.
This patch adds support for inoutset depend-kind in depend
clauses. It is very similar to the in depend-kind in that
a task with a dependency with that depend-kind is dependent
on all previously created sibling tasks with matching address
unless they have the same depend-kind.
In the in depend-kind case everything is dependent except
for in -> in dependency, for inoutset everything is
dependent except for inoutset -> inoutset dependency.
mutexinoutset is also similar (everything is dependent except
for mutexinoutset -> mutexinoutset dependency), but there is
also the additional restriction that only one task with
mutexinoutset for each address can be scheduled at once (i.e.
mutual exclusitivty). For now we support mutexinoutset
the same as inout/out, but the inoutset support is full.
In order not to bump the ABI for dependencies each time
(we've bumped it already once, the old ABI supports only
inout/out and in depend-kind, the new ABI supports
inout/out, mutexinoutset, in and depobj), this patch arranges
for inoutset to be at least for the time being always handled
as if it was specified through depobj even when it is not.
So it uses the new ABI for that and inoutset are represented
like depobj - pointer to a pair of pointers where the first one
will be the actual address of the object mentioned in depend
clause and second pointer will be (void *) GOMP_DEPEND_INOUTSET.
2022-05-17 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-core.h (enum omp_clause_depend_kind): Add
OMP_CLAUSE_DEPEND_INOUTSET.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
* gimplify.cc (gimplify_omp_depend): Likewise.
* omp-low.cc (lower_depend_clauses): Likewise.
gcc/c-family/
* c-omp.cc (c_finish_omp_depobj): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
gcc/c/
* c-parser.cc (c_parser_omp_clause_depend): Parse
inoutset depend-kind.
(c_parser_omp_depobj): Likewise.
gcc/cp/
* parser.cc (cp_parser_omp_clause_depend): Parse
inoutset depend-kind.
(cp_parser_omp_depobj): Likewise.
* cxx-pretty-print.cc (cxx_pretty_printer::statement): Handle
OMP_CLAUSE_DEPEND_INOUTSET.
gcc/testsuite/
* c-c++-common/gomp/all-memory-1.c (boo): Add test with
inoutset depend-kind.
* c-c++-common/gomp/all-memory-2.c (boo): Likewise.
* c-c++-common/gomp/depobj-1.c (f1): Likewise.
(f2): Adjusted expected diagnostics.
* g++.dg/gomp/depobj-1.C (f4): Adjust expected diagnostics.
include/
* gomp-constants.h (GOMP_DEPEND_INOUTSET): Define.
libgomp/
* libgomp.h (struct gomp_task_depend_entry): Change is_in type
from bool to unsigned char.
* task.c (gomp_task_handle_depend): Handle GOMP_DEPEND_INOUTSET.
Ignore dependencies where
task->depend[i].is_in && task->depend[i].is_in == ent->is_in
rather than just task->depend[i].is_in && ent->is_in. Remember
whether GOMP_DEPEND_IN loop is needed and guard the loop with that
conditional.
(gomp_task_maybe_wait_for_dependencies): Handle GOMP_DEPEND_INOUTSET.
Ignore dependencies where elem.is_in && elem.is_in == ent->is_in
rather than just elem.is_in && ent->is_in.
* testsuite/libgomp.c-c++-common/depend-1.c (test): Add task with
inoutset depend-kind.
* testsuite/libgomp.c-c++-common/depend-2.c (test): Likewise.
* testsuite/libgomp.c-c++-common/depend-3.c (test): Likewise.
* testsuite/libgomp.c-c++-common/depend-inoutset-1.c: New test.
This patch adds support for list items in the has_device_addr clause which type
is given by C++ template parameters.
gcc/cp/ChangeLog:
* pt.cc (tsubst_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
* semantics.cc (finish_omp_clauses): Added template decl processing.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-has-device-addr-7.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-8.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-9.C: New test.
For a non-descriptor array, map(A(n:m)) was mapped as
map(tofrom:A[n-1] [len: ...]) map(alloc:A [pointer assign, bias: ...])
with this patch, it is changed to
map(tofrom:A[n-1] [len: ...]) map(firstprivate:A [pointer assign, bias: ...])
The latter avoids an alloc - and also avoids the race condition with
nowait in the enclosed testcase. (Note: predantically, the testcase is
invalid since OpenMP 5.1, violating the map clause restriction at [354:10-13].
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): When mapping nondescriptor
array sections, use GOMP_MAP_FIRSTPRIVATE_POINTER instead of
GOMP_MAP_POINTER for the pointer attachment.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-nowait-array-section.f90: New test.
Including the GCC-shipped 'include/cuda/cuda.h' vs. system <cuda.h> and
'dlopen'ing the CUDA Driver library vs. linking it are separate concerns.
libgomp/
* plugin/Makefrag.am: Handle 'PLUGIN_NVPTX_DYNAMIC'.
* plugin/configfrag.ac (PLUGIN_NVPTX_DYNAMIC): Change
'AC_DEFINE_UNQUOTED' into 'AM_CONDITIONAL'.
* plugin/plugin-nvptx.c: Split 'PLUGIN_NVPTX_DYNAMIC' into
'PLUGIN_NVPTX_INCLUDE_SYSTEM_CUDA_H' and
'PLUGIN_NVPTX_LINK_LIBCUDA'.
* Makefile.in: Regenerate.
* config.h.in: Likewise.
* configure: Likewise.
The ugly part is that OpenMP 5.1 made omp_all_memory a reserved identifier
which isn't allowed to be used anywhere but in the depend clause, this is
against how everything else has been handled in OpenMP so far (where
some identifiers could have special meaning in some OpenMP clauses or
pragmas but not elsewhere).
The patch handles it by making it a conditional keyword (for -fopenmp
only) and emitting a better diagnostics when it is used in a primary
expression. Having a nicer diagnostics when e.g. trying to do
int omp_all_memory;
or
int *omp_all_memory[10];
etc. would mean changing too many spots and hooking into name lookups
to reject declaring any such symbols would be too ugly and I'm afraid
there are way too many spots where one can introduce a name
(variables, functions, namespaces, struct, enum, enumerators, template
arguments, ...).
Otherwise, the handling is quite simple, normal depend clauses lower
into addresses of variables being handed over to the library, for
omp_all_memory I'm using NULL pointers. omp_all_memory can only be
used with inout or out depend kinds and means that a task is dependent
on all previously created sibling tasks that have any dependency (of
any depend kind) and that any later created sibling tasks will be
dependent on it if they have any dependency.
2022-05-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.cc (gimplify_omp_depend): Don't build_fold_addr_expr
if null_pointer_node.
(gimplify_scan_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Print null_pointer_node
as omp_all_memory.
gcc/c-family/
* c-common.h (enum rid): Add RID_OMP_ALL_MEMORY.
* c-omp.cc (c_finish_omp_depobj): Don't build_fold_addr_expr
if null_pointer_node.
gcc/c/
* c-parser.cc (c_parse_init): Register omp_all_memory as keyword
if flag_openmp.
(c_parser_postfix_expression): Diagnose uses of omp_all_memory
in postfix expressions.
(c_parser_omp_variable_list): Handle omp_all_memory in depend
clause.
* c-typeck.cc (c_finish_omp_clauses): Handle omp_all_memory
keyword in depend clause as null_pointer_node, diagnose invalid
uses.
gcc/cp/
* lex.cc (init_reswords): Register omp_all_memory as keyword
if flag_openmp.
* parser.cc (cp_parser_primary_expression): Diagnose uses of
omp_all_memory in postfix expressions.
(cp_parser_omp_var_list_no_open): Handle omp_all_memory in depend
clause.
* semantics.cc (finish_omp_clauses): Handle omp_all_memory
keyword in depend clause as null_pointer_node, diagnose invalid
uses.
* pt.cc (tsubst_omp_clause_decl): Pass through omp_all_memory.
gcc/testsuite/
* c-c++-common/gomp/all-memory-1.c: New test.
* c-c++-common/gomp/all-memory-2.c: New test.
* c-c++-common/gomp/all-memory-3.c: New test.
* g++.dg/gomp/all-memory-1.C: New test.
* g++.dg/gomp/all-memory-2.C: New test.
libgomp/
* libgomp.h (struct gomp_task): Add depend_all_memory member.
* task.c (gomp_init_task): Initialize depend_all_memory.
(gomp_task_handle_depend): Handle omp_all_memory.
(gomp_task_run_post_handle_depend_hash): Clear
parent->depend_all_memory if equal to current task.
(gomp_task_maybe_wait_for_dependencies): Handle omp_all_memory.
* testsuite/libgomp.c-c++-common/depend-1.c: New test.
* testsuite/libgomp.c-c++-common/depend-2.c: New test.
* testsuite/libgomp.c-c++-common/depend-3.c: New test.
This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-include=[...]', '--with-hsa-runtime-lib=[...]' -- which
nobody really is doing, as far as I can tell.
Originally changed for the libgomp HSA plugin in
commit b8d89b03db (r242749)
"Remove build dependence on HSA run-time", and later propagated into the GCN
plugin, these are no longer built against system-provided HSA Runtime library.
Instead, unconditionally built against the GCC-shipped 'include/hsa*.h' header
files, and at run time does 'dlopen("libhsa-runtime64.so.1")'. It thus doesn't
make sense to consider references to system-provided HSA Runtime library during
libgomp GCN plugin build.
libgomp/
* plugin/configfrag.ac (HSA_RUNTIME_CPPFLAGS)
(HSA_RUNTIME_LDFLAGS): Remove.
* configure: Regenerate.
This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can
tell.
'libgomp/testsuite/lib/libgomp.exp:libgomp_init' states:
# For build-tree testing, also consider the library paths used for builing.
# For installed testing, we assume all that to be provided in the sysroot.
if { $blddir != "" } {
[...]
global hsa_runtime_lib
if { $hsa_runtime_lib != "" } {
append always_ld_library_path ":$hsa_runtime_lib"
}
}
However, the libgomp GCN plugin is unconditionally built against the
GCC-shipped 'include/hsa*.h' header files, and at run time does
'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
library "used for builing". It thus doesn't make sense to amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library.
libgomp/
* testsuite/lib/libgomp.exp (libgomp_init): Don't
'append always_ld_library_path ":$hsa_runtime_lib"'.
* testsuite/libgomp-test-support.exp.in (hsa_runtime_lib): Don't set.
Fix-up for recent commit r13-116-g3f8c389fe90bf565a6221a46bb7fb745dd4c1510
"OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg", where we
currently get:
libgomp: use_device_ptr pointer wasn't mapped
FAIL: libgomp.fortran/use_device_addr-5.f90 -O execution test
libgomp/
* testsuite/libgomp.fortran/use_device_addr-5.f90: Fix up
multi-device testing.
While -foffload=-<flag> works (never documented legacy feature),
the documented way is to use -foffload-options=.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (isa_matches_agent): Suggest -foffload-options.
For array-descriptor vars, the descriptor is assigned to a temporary. However,
this failed when the clause's argument was in turn in a data-sharing clause
as the outer context's VALUE_EXPR wasn't used.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list
item that is in an outer data-sharing clause.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/use_device_addr-5.f90: New test.
Last fall I've changed struct gomp_work_share, so that it doesn't have
__attribute__((aligned (64))) lock member in the middle unless the target has
non-emulated aligned allocator, otherwise it just makes sure the first and
second halves are 64 bytes appart for cache line reasons, but doesn't make
the struct 64-byte aligned itself and so we can use normal allocators for it.
When the struct isn't 64-byte aligned, the amount of tail padding significantly
decreases, to 0 or 4 bytes or so. The library uses that tail padding when
the ordered_teams_ids array (array of uints) and/or the memory for lastprivate
conditional temporaries (the latter wants to guarantee long long alignment).
The problem with it on ia32 darwin9 is that while the struct contains
long long members, long long is just 4 byte aligned while __alignof__(long long)
is 8. That causes problems in gomp_init_work_share, where we currently rely on
if offsetof (struct gomp_work_share, inline_ordered_team_ids) is long long
aligned, then that tail array will be aligned at runtime and so no extra
memory for dynamic realignment will be needed (that is false when the whole
struct doesn't have long long alignment). And also in the remaining hunks
causes another problem, where we compute INLINE_ORDERED_TEAM_IDS_OFF
as the above offsetof aligned up to long long boundary and subtract
sizeof (struct gomp_work_share) and INLINE_ORDERED_TEAM_IDS_OFF.
When unlucky, the former isn't multiple of 8 and the latter is 4 bigger
than that and as the subtraction is done in size_t, we end up with (size_t) -4,
so the comparison doesn't really work.
The fixes add additional conditions to make it work properly, but all of them
should be evaluated at compile time when optimizing and so shouldn't slow
anything.
2022-04-26 Jakub Jelinek <jakub@redhat.com>
PR libgomp/105358
* work.c (gomp_init_work_share): Don't mask of adjustment for
dynamic long long realignment if struct gomp_work_share has smaller
alignof than long long.
* loop.c (GOMP_loop_start): Don't use inline_ordered_team_ids if
struct gomp_work_share has smaller alignof than long long or if
sizeof (struct gomp_work_share) is smaller than
INLINE_ORDERED_TEAM_IDS_OFF.
* loop_ull.c (GOMP_loop_ull_start): Likewise.
* sections.c (GOMP_sections2_start): Likewise.
So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs
for which that function is invoked need an extra artificial BIND_EXPR
around their body so that we move all variables of the bodies.
The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
or OMP_TARGET and for OpenACC constructs that behave similarly to
OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.
The following patch does that for OpenACC constructs too.
PR fortran/104717
gcc/fortran/
* trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body
in an extra BIND_EXPR.
gcc/testsuite/
* gfortran.dg/goacc/pr104717.f90: New test.
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
This fixes a typo in the 5.0 feature support table.
2022-04-13 Jakub Jelinek <jakub@redhat.com>
* libgomp.texi: Fix a typo - mutexinouset -> mutexinoutset.
... so that it may be used by other projects that inherit GCC's 'include'
directory.
include/
* cuda/cuda.h: New file.
libgomp/
* plugin/cuda/cuda.h: Remove file.
* plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include
"cuda/cuda.h" instead of <cuda.h>.
* plugin/configfrag.ac <PLUGIN_NVPTX_DYNAMIC>: Don't set
'PLUGIN_NVPTX_CPPFLAGS'.
* configure: Regenerate.
This patch fixes a bug in lower_omp_target, where for Fortran arrays,
the expanded sender assignment is wrongly using the variable in the
current ctx, instead of the one looked-up outside, which is causing
use_device_ptr/addr to fail to work when used inside an omp-parallel
(where the omp child_fn is split away from the original).
The fix is inside omp-low.cc, though because the omp_array_data langhook
is used only by Fortran, this is essentially Fortran-specific.
2022-04-05 Chung-Lin Tang <cltang@codesourcery.com>
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Use outer context looked-up 'var' as
argument to lang_hooks.decls.omp_array_data, instead of 'ovar' from
current clause.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/use_device_ptr-4.f90: New testcase.
The test-cases libgomp.fortran/examples-4/declare_target-{1,2}.f90 mean to
set an nvptx-specific limit using offload_target_nvptx, but also change
behaviour for amd.
That is, there is now a difference in behaviour between:
- a compiler configured for GCN offloading, and
- a compiler configured for both GCN and nvptx offloading.
Fix this by using instead on_device_arch_nvptx.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-04-04 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Use
on_device_arch_nvptx instead of offload_target_nvptx.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.