Commit Graph

1675 Commits

Author SHA1 Message Date
Thomas Schwinge f8187b5c0d Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90'
This was a latent problem, and this commit here now resolves a regression that
after recent commit a78b1ab1df
"amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a
GCN offloading '-march=gfx908' system:

    {+WARNING: program timed out.+}
    [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test

Same for other optimization levels.

Make sure that we're not executing non-parallelized code in gang-redundant
mode, by putting these parts into their own 'parallel' constructs, which then
default to 'num_gangs(1)'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC
	gang-redundant execution.
2022-02-22 17:32:03 +01:00
Tom de Vries 5ed77fb3ed [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
    ;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  <tdevries@suse.de>

	PR target/99555
	* config/nvptx/bar.c (generation_to_barrier): New function, copied
	from config/rtems/bar.c.
	(futex_wait, futex_wake): New function.
	(do_spin, do_wait): New function, copied from config/linux/wait.h.
	(gomp_barrier_wait_end, gomp_barrier_wait_last)
	(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
	(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
	and replace with include of config/linux/bar.c.
	* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
	(gomp_barrier_init): Init new fields.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
	workarounds.
	* testsuite/libgomp.c/pr99555-1.c: Same.
	* testsuite/libgomp.fortran/task-detach-6.f90: Same.
2022-02-22 15:48:03 +01:00
Tom de Vries 6263b656c8 [libgomp, testsuite, nvptx] Fix pr96390.c without CUDA
When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...

The problem is that we're expecting the following ptxas error:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
ptxas /tmp/ccZYDw8N.o, line 90; error   : Call to 'baz' requires call prototype
ptxas /tmp/ccZYDw8N.o, line 90; error   : Unknown symbol 'baz'
...

But it's not triggered because ptxas is not in the path, so nvptx-none-as
defaults to --no-verify.

So instead, we run into the same error at execution time.

Fix this by forcing verification using:
...
/* { dg-additional-options "-foffload=-Wa,--verify" \
     { target offload_target_nvptx } } */
...
such that we run into the xfail in this way instead:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory
nvptx-as: ptxas returned 255 exit status
...

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-02-21  Tom de Vries  <tdevries@suse.de>

	PR testsuite/104146
	* testsuite/libgomp.c++/pr96390.C: Add additional-option
	-foffload=-Wa,--verify for nvptx.
	* testsuite/libgomp.c-c++-common/pr96390.c: Same.
2022-02-22 10:23:20 +01:00
GCC Administrator 875e493bf5 Daily bump. 2022-02-16 00:16:26 +00:00
Tobias Burnus 3939c1b112 Fortran/OpenMP: Fix depend-clause handling
gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj):
	Depend on the proper addr, for ptr/alloc depend on pointee.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/depend-4.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/depend-4.f90: New test.
	* gfortran.dg/gomp/depend-5.f90: New test.
2022-02-15 12:26:48 +01:00
GCC Administrator a645583d4d Daily bump. 2022-02-11 00:16:25 +00:00
Tobias Burnus c22f3fb780 OpenMP/C++: Permit mapping classes with virtual members [PR102204]
PR c++/102204
gcc/cp/ChangeLog:

	* decl2.cc (cp_omp_mappable_type_1): Remove check for virtual
	members as those are permitted since OpenMP 5.0.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/target-virtual-1.C: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/unmappable-1.C: Remove previously expected dg-message.
2022-02-10 19:03:42 +01:00
Marcel Vollweiler bbb7f8604e C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct.
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff):

	has_device_addr(list)

"The has_device_addr clause indicates that its list items already have device
addresses and therefore they may be directly accessed from a target device.
If the device address of a list item is not for the device on which the target
region executes, accessing the list item inside the region results in
unspecified behavior. The list items may include array sections." (p. 200)

"A list item may not be specified in both an is_device_ptr clause and a
has_device_addr clause on the directive." (p. 202)

"A list item that appears in an is_device_ptr or a has_device_addr clause must
not be specified in any data-sharing attribute clause on the same target
construct." (p. 203)

gcc/c-family/ChangeLog:

	* c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
	* c-pragma.h (enum pragma_kind): Added 5.1 in comment.
	(enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr'
	clause.
	(c_parser_omp_variable_list): Handle array sections.
	(c_parser_omp_clause_has_device_addr): Added.
	(c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
	case.
	(c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to
	OMP_CLAUSE_MASK.
	* c-typeck.cc (handle_omp_array_sections): Handle clause restrictions.
	(c_finish_omp_clauses): Handle array sections.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
	(cp_parser_omp_var_list_no_open): Handle array sections.
	(cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
	case.
	(cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
	* semantics.cc (handle_omp_array_sections): Handle clause restrictions.
	(finish_omp_clauses): Handle array sections.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR
	case.
	* gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR.
	* openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
	(gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause.
	(resolve_omp_clauses): Same.
	* trans-openmp.cc (gfc_trans_omp_variable_list): Added
	OMP_LIST_HAS_DEVICE_ADDR case.
	(gfc_trans_omp_clauses): Firstprivatize of array descriptors.

gcc/ChangeLog:

	* gimplify.cc (gimplify_scan_omp_clauses): Added cases for
	OMP_CLAUSE_HAS_DEVICE_ADDR
	and handle array sections.
	(gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
	* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR.
	(lower_omp_target): Same.
	* tree-core.h (enum omp_clause_code): Same.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Same.
	(convert_local_omp_clauses): Same.
	* tree-pretty-print.cc (dump_omp_clause): Same.
	* tree.cc: Same.

libgomp/ChangeLog:

	* libgomp.texi: Updated entry for HAS_DEVICE_ADDR.
	* target.c (copy_firstprivate_data): Copy only if host address is not
	NULL.
	* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-4.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-5.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-6.C: New test.
	* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
	* testsuite/libgomp.c/target-has-device-addr-3.c: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases.
	* g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases.
	* g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases.
	* c-c++-common/gomp/target-has-device-addr-1.c: New test.
	* c-c++-common/gomp/target-has-device-addr-2.c: New test.
	* c-c++-common/gomp/target-is-device-ptr-1.c: New test.
	* c-c++-common/gomp/target-is-device-ptr-2.c: New test.
	* gfortran.dg/gomp/is_device_ptr-3.f90: New test.
	* gfortran.dg/gomp/target-has-device-addr-1.f90: New test.
	* gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
2022-02-09 23:47:12 -08:00
GCC Administrator 2a2fda2d9b Daily bump. 2022-02-09 00:16:24 +00:00
Jakub Jelinek 0af7ef050a libgomp: Fix segfault with posthumous orphan tasks [PR104385]
The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits.  Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed.  The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/104385
	* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
	clear task->parent.
	* testsuite/libgomp.c/pr104385.c: New test.
2022-02-08 09:30:17 +01:00
GCC Administrator 3c1cbde16e Daily bump. 2022-02-05 00:16:31 +00:00
Tobias Burnus f62156eab7 libgomp.fortran/allocate-1.f90: Fix minor cleanup
libgomp/ChangeLog:
	* testsuite/libgomp.fortran/allocate-1.f90: Remove spurious
	STOP of previous commit.
2022-02-04 17:31:21 +01:00
Tobias Burnus 6d49813501 libgomp.fortran/allocate-1.f90: Minor cleanup
libgomp/ChangeLog:
	* testsuite/libgomp.fortran/allocate-1.c (is_64bit_aligned): Renamed
	from is_64bit_aligned_.
	* testsuite/libgomp.fortran/allocate-1.f90: Fix interface decl
	and use it, more implicit none, remove unused argument.
2022-02-04 14:51:01 +01:00
GCC Administrator 682ede3959 Daily bump. 2022-02-04 00:16:24 +00:00
David Seifert 45ba6bf28b make `-Werror` optional in libatomic/libbacktrace/libgomp/libitm/libsanitizer
* `-Werror` can cause issues when a more recent version of GCC compiles
  an older version:
  - https://bugs.gentoo.org/229059
  - https://bugs.gentoo.org/475350
  - https://bugs.gentoo.org/667104

libatomic/ChangeLog:

	* configure.ac: Support --disable-werror.
	* configure: Regenerate.

libbacktrace/ChangeLog:

	* configure.ac: Support --disable-werror.
	* configure: Regenerate.

libgomp/ChangeLog:

	* configure.ac: Support --disable-werror.
	* configure: Regenerate.

libitm/ChangeLog:

	* configure.ac: Support --disable-werror.
	* configure: Regenerate.

libsanitizer/ChangeLog:

	* configure.ac: Support --disable-werror.
	* aclocal.m4: Include also ../config/warnings.m4.
	* libbacktrace/Makefile.am (WARN_FLAGS): Remove.
	* configure: Regenerate.
	* Makefile.in: Regenerate.
	* asan/Makefile.in: Regenerate.
	* hwasan/Makefile.in: Regenerate.
	* interception/Makefile.in: Regenerate.
	* libbacktrace/Makefile.in: Regenerate.
	* lsan/Makefile.in: Regenerate.
	* sanitizer_common/Makefile.in: Regenerate.
	* tsan/Makefile.in: Regenerate.
	* ubsan/Makefile.in: Regenerate.

Co-Authored-By: Jakub Jelinek <jakub@redhat.com>
2022-02-03 16:10:18 +01:00
GCC Administrator ae7e4af964 Daily bump. 2022-02-02 00:17:16 +00:00
Tom de Vries e0451f93d9 [nvptx] Add some support for .local atomics
The ptx insn atom doesn't support local memory.  In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.

The message is somewhat confusing given that actually the operation is not
supported on local address space.

Fix this by falling back on a non-atomic version when detecting
a frame-related memory operand.

This only solves some cases that are detected at compile-time.  It does
however fix the openacc private-atomic-* test-cases.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1")
	(define_insn "atomic_exchange<mode>")
	(define_insn "atomic_fetch_add<mode>")
	(define_insn "atomic_fetch_addsf")
	(define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version
	if memory operands is frame-relative.

gcc/testsuite/ChangeLog:

2022-01-31  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/stack-atomics-run.c: New test.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove
	PR83812 workaround.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
2022-02-01 19:28:24 +01:00
Tom de Vries d43fbc7d3f [libgomp, testsuite] Fix insufficient resources in test-cases
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
  'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
  recompile the program with 'num_workers = x and vector_length = y' on \
  that offloaded region or '-fopenacc-dim=y' where x * y <= 896.

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  \
  -O0  execution test
...

The error does not occur when using GOMP_NVPTX_JIT=-O0.

Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia.

Likewise for some other test-cases.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce
	num_workers for nvidia accelerator to fix libgomp error 'insufficient
	resources'.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	Same.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
2022-02-01 08:15:00 +01:00
Tom de Vries be362d5e12 [libgomp, testsuite] Reduce recursion depth in declare_target-*.f90
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx
accelerator (Nvidia T400, 2GB), I run into:
...
libgomp: cuCtxSynchronize error: unspecified launch failure \
  (perhaps abort was called)

libgomp: cuMemFree_v2 error: unspecified launch failure

libgomp: device finalization failed
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O0  execution test
...

The test-case contains:
...
  ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
  ! Nvidia Titan V.
  if (fib (23) /= fib_wrapper (23)) stop 2
...

Fix this by reducing the fib/fib_wrapper argument from 23 to 22.

Same for declare_target-2.f90.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
	recursion depth.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
2022-02-01 08:13:06 +01:00
GCC Administrator 1bb5266257 Daily bump. 2022-02-01 00:16:29 +00:00
Martin Liska c99a6eb015 Add mold detection for libs.
libatomic/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libgomp/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libitm/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libstdc++-v3/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.
2022-01-31 09:46:44 +01:00
GCC Administrator 99f17e996f Daily bump. 2022-01-28 00:16:32 +00:00
Tobias Burnus b2a0f3a454 libgomp.texi: Update OpenMP implementation status
libgomp/
	* libgomp.texi (OpenMP 5.0): Update implementation status.
2022-01-27 09:39:23 +01:00
GCC Administrator 9dd443578f Daily bump. 2022-01-22 00:16:26 +00:00
Thomas Schwinge 087e545747 Strengthen a few OpenACC test cases
Rather than rubber-stamp whatever requested vs. actual device kernel launch
configuration happens, actually (again) verify the requested values (modulo
expected variations).

This better highlights that "AMD GCN has an upper limit of 'num_workers(16)'",
and the deficiency that "AMD GCN uses the autovectorizer for the vector
dimension: the use of a function call in vector-partitioned code [...] is not
currently supported".

And, this removes several instances of race conditions, where variables are
concurrently written to in OpenACC gang-redundant mode.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Strengthen.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
2022-01-21 18:45:30 +01:00
GCC Administrator fe1ad14165 Daily bump. 2022-01-20 00:16:54 +00:00
Marcel Vollweiler 0bd247bbbe libgomp, OpenMP: Fix issue for omp_get_device_num on gcn targets.
Currently omp_get_device_num does not work on gcn targets with more than one
offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
icv-device.c and thus "__gomp_device_num" is not visible in the offload image.

This patch removes "static" such that "__gomp_device_num" is now part of the
offload image and can now be found in GOMP_OFFLOAD_load_image in the plugin.

This is not an issue for nvptx. There, "__gomp_device_num" is in the offload
image even with "static".

libgomp/ChangeLog:

	* config/gcn/icv-device.c: Make GOMP_DEVICE_NUM_VAR public (remove
	"static") to make the device num available in the offload image.
2022-01-19 05:03:54 -08:00
Martin Liska 2aea19bdb1 nvptx: update fix for -Wformat-diag
gcc/ChangeLog:

	* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Update
	warning messages.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update scanning
	patterns.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-01-19 08:27:00 +01:00
GCC Administrator 7a761ae658 Daily bump. 2022-01-19 00:16:32 +00:00
Martin Liska b1f3640912 nvptx: fix -Wformat-diag warnings
gcc/ChangeLog:

	* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Wrap
	keyword.
	* config/nvptx/nvptx.md: Remove trailing dot.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update keyword
	in dg-warning.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
2022-01-18 17:25:36 +01:00
GCC Administrator fc82978278 Daily bump. 2022-01-18 00:16:54 +00:00
Thomas Schwinge b75aab194e Extend test cases for references in OpenACC 'private' clauses
libgomp/
	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Extend.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
2022-01-17 08:57:27 +01:00
Julian Brown fbb438808e Test cases for references in OpenACC 'private' clauses
libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-01-17 08:57:20 +01:00
GCC Administrator 1e942d7c05 Daily bump. 2022-01-17 00:16:24 +00:00
Kwok Cheung Yeung a78b1ab1df amdgcn: Tune default OpenMP/OpenACC GPU utilization
libgomp/
	* plugin/plugin-gcn.c (parse_target_attributes): Automatically set
	the number of teams and threads if necessary.
	(gcn_exec): Automatically set the number of gangs and workers if
	necessary.

Co-Authored-By: Andrew Stubbs  <ams@codesourcery.com>
2022-01-16 17:25:36 +01:00
GCC Administrator ad3f0d0806 Daily bump. 2022-01-14 00:16:30 +00:00
Hafiz Abid Qadeer 69561fc781 Add support for allocate clause (OpenMP 5.0).
This patch adds support for OpenMP 5.0 allocate clause for fortran. It does not
yet support the allocator-modifier as specified in OpenMP 5.1. The allocate
clause is already supported in C/C++.

gcc/fortran/ChangeLog:

	* dump-parse-tree.c (show_omp_clauses): Handle OMP_LIST_ALLOCATE.
	* gfortran.h (OMP_LIST_ALLOCATE): New enum value.
	* openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATE.
	(gfc_match_omp_clauses): Handle OMP_CLAUSE_ALLOCATE
	(OMP_PARALLEL_CLAUSES, OMP_DO_CLAUSES, OMP_SECTIONS_CLAUSES)
	(OMP_TASK_CLAUSES, OMP_TASKLOOP_CLAUSES, OMP_TARGET_CLAUSES)
	(OMP_TEAMS_CLAUSES, OMP_DISTRIBUTE_CLAUSES)
	(OMP_SINGLE_CLAUSES): Add OMP_CLAUSE_ALLOCATE.
	(OMP_TASKGROUP_CLAUSES): New.
	(gfc_match_omp_taskgroup): Use OMP_TASKGROUP_CLAUSES instead of
	OMP_CLAUSE_TASK_REDUCTION.
	(resolve_omp_clauses): Handle OMP_LIST_ALLOCATE.
	(resolve_omp_do): Avoid warning when loop iteration variable is
	in allocate clause.
	* trans-openmp.c (gfc_trans_omp_clauses): Handle translation of
	allocate clause.
	(gfc_split_omp_clauses): Update for OMP_LIST_ALLOCATE.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-1.f90: New test.
	* gfortran.dg/gomp/allocate-2.f90: New test.
	* gfortran.dg/gomp/allocate-3.f90: New test.
	* gfortran.dg/gomp/collapse1.f90: Update error message.
	* gfortran.dg/gomp/openmp-simd-4.f90: Likewise.
	* gfortran.dg/gomp/clauses-1.f90: Uncomment allocate clause.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/allocate-1.c: New test.
	* testsuite/libgomp.fortran/allocate-1.f90: New test.
	* libgomp.texi: Remove string that says that allocate clause
	support is for C/C++ only.
2022-01-13 18:57:05 +00:00
Thomas Schwinge d97364aab1 Improve Intel MIC offloading XFAILing for 'omp_get_device_num'
After recent commit be661959a6
"libgomp/testsuite: Improve omp_get_device_num() tests", we're now iterating
over all OpenMP target devices.  Intel MIC (emulated) offloading still doesn't
properly implement device-side 'omp_get_device_num', and we thus regress:

    PASS: libgomp.c/../libgomp.c-c++-common/target-45.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/target-45.c execution test

    PASS: libgomp.c++/../libgomp.c-c++-common/target-45.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/target-45.c execution test

    PASS: libgomp.fortran/target10.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O0  execution test
    PASS: libgomp.fortran/target10.f90   -O1  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O1  execution test
    PASS: libgomp.fortran/target10.f90   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O2  execution test
    PASS: libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: libgomp.fortran/target10.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -g  execution test
    PASS: libgomp.fortran/target10.f90   -Os  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -Os  execution test

Improve the XFAILing added in commit bb75b22aba
"Allow matching Intel MIC in OpenMP 'declare variant'" for the case that *any*
Intel MIC offload device is available.

	libgomp/
	* testsuite/libgomp.c-c++-common/on_device_arch.h
	(any_device_arch, any_device_arch_intel_mic): New.
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_any_intel_mic): New.
	* testsuite/libgomp.c-c++-common/target-45.c: Use it.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
2022-01-13 13:09:36 +01:00
Thomas Schwinge 2edbcaed95 Document current '-Wuninitialized' diagnostics for 'libgomp.oacc-fortran/routine-10.f90' [PR102192]
libgomp/
	PR tree-optimization/102192
	* testsuite/libgomp.oacc-fortran/routine-10.f90: Document current
	'-Wuninitialized' diagnostics.
2022-01-13 11:52:35 +01:00
Thomas Schwinge 4bd8b1e881 Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases
... including "note: '[...]' was declared here" emitted since recent
commit 9695e1c23b
"Improve -Wuninitialized note location".

For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es,
including one more instance of PR77504 etc., and several instances where
for "local variables" of reference-data-type reductions (etc.?) we emit
bogus (?) diagnostics.

For implicit data clauses (including 'firstprivate'), we seem to be missing
diagnostics, so I've placed XFAILed 'dg-warning's.

	gcc/testsuite/
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document
	current '-Wuninitialized' diagnostics.
	* c-c++-common/goacc/mdc-1.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-routine.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise.
	* c-c++-common/goacc/uninit-if-clause.c: Likewise.
	* gfortran.dg/goacc/array-with-dt-1.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-2.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-1.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-3.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-4.f90: Likewise.
	* gfortran.dg/goacc/derived-classtypes-1.f95: Likewise.
	* gfortran.dg/goacc/derived-types-2.f90: Likewise.
	* gfortran.dg/goacc/host_data-tree.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/modules.f95: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/pr93464.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
	* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-if-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise.
	* gfortran.dg/goacc/wait.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document
	current '-Wuninitialized' diagnostics.
	* testsuite/libgomp.oacc-fortran/data-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reference-reductions.f90:
	Likewise.
2022-01-13 11:52:35 +01:00
Thomas Schwinge 9fcc3a1dd2 Host and offload targets have no common meaning of address spaces
gcc/
	* tree-streamer-out.c (pack_ts_base_value_fields): Don't pack
	'TYPE_ADDR_SPACE' for offloading.
	* tree-streamer-in.c (unpack_ts_base_value_fields): Don't unpack
	'TYPE_ADDR_SPACE' for offloading.
	libgomp/
	* testsuite/libgomp.c/address-space-1.c: Remove 'dg-xfail-run-if'
	for 'offload_device_intel_mic'.
2022-01-13 11:16:20 +01:00
Julian Brown e52253bcc0 Wait at end of OpenACC asynchronous kernels regions
In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and
asynchronous data and compute regions, giving rise to data races when the
asynchronicity is actually executed, as is visible in at least on test case
with GCN offloading.

The proper fix is to correctly use the asynchronous interfaces, making the
currently synchronous data regions fully asynchronous (see also
<https://gcc.gnu.org/PR97390> "[OpenACC] 'async' clause on 'data' construct",
which is to share the same implementation), but that's for later; for now add
some more synchronization.

	gcc/
	* omp-oacc-kernels-decompose.cc (add_wait): New function, split out
	of...
	(add_async_clauses_and_wait): ...here. Call new outlined function.
	(decompose_kernels_region_body): Add wait at the end of
	explicitly-asynchronous kernels regions.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN
	offloading execution XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2022-01-13 10:42:17 +01:00
Thomas Schwinge 9b32c1669a OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]
... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:

    13073			else if (is_gimple_reg (var))
    13074			  {
    13075			    gcc_assert (offloaded);
    13076			    tree avar = create_tmp_var (TREE_TYPE (var));
    13077			    mark_addressable (avar);

..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
to make work, so let's just avoid this case.

	gcc/
	PR middle-end/100280
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Mark variables used in synthesized data clauses as addressable.
	gcc/testsuite/
	PR middle-end/100280
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
	* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
	Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
	'--param openacc-kernels=decompose'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Update.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
	Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
	'--param openacc-kernels=decompose'.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	libgomp/
	PR middle-end/100280
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.

Suggested-by: Julian Brown <julian@codesourcery.com>
2022-01-13 10:42:17 +01:00
Thomas Schwinge 862e5f398b Enhance OpenACC 'kernels' decomposition testing
gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: Enhance.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
2022-01-13 10:42:17 +01:00
GCC Administrator 7d11b64b18 Daily bump. 2022-01-05 00:16:52 +00:00
Tobias Burnus be661959a6 libgomp/testsuite: Improve omp_get_device_num() tests
Related to r12-6208-gebc853deb7cc0487de9ef6e891a007ba853d1933
"libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load"

That commit fixed an issue with omp_get_device_num() on gcn/nvptx that
resulted in having always the value 0.
This commit modifies the tests to iterate over all devices such that on a
multi-nonhost-device system it had detected that always-zero issue.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/target-45.c: Iterate over all devices.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
2022-01-04 14:58:06 +01:00
Chung-Lin Tang fbb592407c libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load
In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
	into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
2022-01-04 17:26:23 +08:00
GCC Administrator a4ae8c3701 Daily bump. 2022-01-04 00:16:40 +00:00
Jakub Jelinek 7adcbafe45 Update copyright years. 2022-01-03 10:42:10 +01:00
Jakub Jelinek 877e3c2abf Update Copyright in ChangeLog files
Do this separately from all other Copyright updates, as ChangeLog files
can be modified only separately.
2022-01-03 10:31:39 +01:00