Commit Graph

1782 Commits

Author SHA1 Message Date
Thomas Schwinge de6e81ea96 OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into OMP lowering [PR100280]
... in preparation for later changes.  No functional change.

Follow-up to commit 9b32c1669a
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	gcc/
	* tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New.
	* tree-core.h: Document it.
	* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Handle
	'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of
	'TREE_ADDRESSABLE'.
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
2022-03-04 14:21:01 +01:00
Thomas Schwinge e5ae22c561 Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable" [PR100280]
Follow-up to commit 9b32c1669a
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	gcc/
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Add diagnostic: "note: OpenACC 'kernels' decomposition: variable
	'[...]' declared in block made addressable".
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
	'--param=openacc-privatization=noisy'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
2022-03-04 14:21:00 +01:00
GCC Administrator a35f16971b Daily bump. 2022-03-01 00:16:28 +00:00
Tom de Vries f485b0ed7d [libgomp, testsuite, nvptx] Add -mptx=_ in declare-variant-3-sm*.c
When running with target board unix/-foffload=-mptx=3.1, we run into:
...
lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \
  selected -misa (sm_53)^M
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \
  1 exit status^M
compilation terminated.^M
  ...
FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors)
...

Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm*.c
test-cases.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-28  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: Same.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: Same.
2022-02-28 10:10:51 +01:00
GCC Administrator 756a61851c Daily bump. 2022-02-25 00:16:20 +00:00
Tom de Vries 59b8ade887 [libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c
Add openmp test-cases that test the omp declare variant construct:
...
  #pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.

Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do
link.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-02-24  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/declare-variant-3-sm30.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm35.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm53.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm70.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm75.c: New test.
	* testsuite/libgomp.c/declare-variant-3-sm80.c: New test.
	* testsuite/libgomp.c/declare-variant-3.h: New header file.
2022-02-24 11:41:03 +01:00
GCC Administrator 2cfb33fc1e Daily bump. 2022-02-23 00:16:24 +00:00
Thomas Schwinge f8187b5c0d Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90'
This was a latent problem, and this commit here now resolves a regression that
after recent commit a78b1ab1df
"amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a
GCN offloading '-march=gfx908' system:

    {+WARNING: program timed out.+}
    [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test

Same for other optimization levels.

Make sure that we're not executing non-parallelized code in gang-redundant
mode, by putting these parts into their own 'parallel' constructs, which then
default to 'num_gangs(1)'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC
	gang-redundant execution.
2022-02-22 17:32:03 +01:00
Tom de Vries 5ed77fb3ed [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
    ;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  <tdevries@suse.de>

	PR target/99555
	* config/nvptx/bar.c (generation_to_barrier): New function, copied
	from config/rtems/bar.c.
	(futex_wait, futex_wake): New function.
	(do_spin, do_wait): New function, copied from config/linux/wait.h.
	(gomp_barrier_wait_end, gomp_barrier_wait_last)
	(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
	(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
	and replace with include of config/linux/bar.c.
	* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
	(gomp_barrier_init): Init new fields.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
	workarounds.
	* testsuite/libgomp.c/pr99555-1.c: Same.
	* testsuite/libgomp.fortran/task-detach-6.f90: Same.
2022-02-22 15:48:03 +01:00
Tom de Vries 6263b656c8 [libgomp, testsuite, nvptx] Fix pr96390.c without CUDA
When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...

The problem is that we're expecting the following ptxas error:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
ptxas /tmp/ccZYDw8N.o, line 90; error   : Call to 'baz' requires call prototype
ptxas /tmp/ccZYDw8N.o, line 90; error   : Unknown symbol 'baz'
...

But it's not triggered because ptxas is not in the path, so nvptx-none-as
defaults to --no-verify.

So instead, we run into the same error at execution time.

Fix this by forcing verification using:
...
/* { dg-additional-options "-foffload=-Wa,--verify" \
     { target offload_target_nvptx } } */
...
such that we run into the xfail in this way instead:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory
nvptx-as: ptxas returned 255 exit status
...

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-02-21  Tom de Vries  <tdevries@suse.de>

	PR testsuite/104146
	* testsuite/libgomp.c++/pr96390.C: Add additional-option
	-foffload=-Wa,--verify for nvptx.
	* testsuite/libgomp.c-c++-common/pr96390.c: Same.
2022-02-22 10:23:20 +01:00
GCC Administrator 875e493bf5 Daily bump. 2022-02-16 00:16:26 +00:00
Tobias Burnus 3939c1b112 Fortran/OpenMP: Fix depend-clause handling
gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj):
	Depend on the proper addr, for ptr/alloc depend on pointee.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/depend-4.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/depend-4.f90: New test.
	* gfortran.dg/gomp/depend-5.f90: New test.
2022-02-15 12:26:48 +01:00
GCC Administrator a645583d4d Daily bump. 2022-02-11 00:16:25 +00:00
Tobias Burnus c22f3fb780 OpenMP/C++: Permit mapping classes with virtual members [PR102204]
PR c++/102204
gcc/cp/ChangeLog:

	* decl2.cc (cp_omp_mappable_type_1): Remove check for virtual
	members as those are permitted since OpenMP 5.0.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/target-virtual-1.C: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/unmappable-1.C: Remove previously expected dg-message.
2022-02-10 19:03:42 +01:00
Marcel Vollweiler bbb7f8604e C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct.
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff):

	has_device_addr(list)

"The has_device_addr clause indicates that its list items already have device
addresses and therefore they may be directly accessed from a target device.
If the device address of a list item is not for the device on which the target
region executes, accessing the list item inside the region results in
unspecified behavior. The list items may include array sections." (p. 200)

"A list item may not be specified in both an is_device_ptr clause and a
has_device_addr clause on the directive." (p. 202)

"A list item that appears in an is_device_ptr or a has_device_addr clause must
not be specified in any data-sharing attribute clause on the same target
construct." (p. 203)

gcc/c-family/ChangeLog:

	* c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
	* c-pragma.h (enum pragma_kind): Added 5.1 in comment.
	(enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr'
	clause.
	(c_parser_omp_variable_list): Handle array sections.
	(c_parser_omp_clause_has_device_addr): Added.
	(c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
	case.
	(c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to
	OMP_CLAUSE_MASK.
	* c-typeck.cc (handle_omp_array_sections): Handle clause restrictions.
	(c_finish_omp_clauses): Handle array sections.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
	(cp_parser_omp_var_list_no_open): Handle array sections.
	(cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
	case.
	(cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
	* semantics.cc (handle_omp_array_sections): Handle clause restrictions.
	(finish_omp_clauses): Handle array sections.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR
	case.
	* gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR.
	* openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
	(gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause.
	(resolve_omp_clauses): Same.
	* trans-openmp.cc (gfc_trans_omp_variable_list): Added
	OMP_LIST_HAS_DEVICE_ADDR case.
	(gfc_trans_omp_clauses): Firstprivatize of array descriptors.

gcc/ChangeLog:

	* gimplify.cc (gimplify_scan_omp_clauses): Added cases for
	OMP_CLAUSE_HAS_DEVICE_ADDR
	and handle array sections.
	(gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
	* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR.
	(lower_omp_target): Same.
	* tree-core.h (enum omp_clause_code): Same.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Same.
	(convert_local_omp_clauses): Same.
	* tree-pretty-print.cc (dump_omp_clause): Same.
	* tree.cc: Same.

libgomp/ChangeLog:

	* libgomp.texi: Updated entry for HAS_DEVICE_ADDR.
	* target.c (copy_firstprivate_data): Copy only if host address is not
	NULL.
	* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-4.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-5.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-6.C: New test.
	* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
	* testsuite/libgomp.c/target-has-device-addr-3.c: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases.
	* g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases.
	* g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases.
	* c-c++-common/gomp/target-has-device-addr-1.c: New test.
	* c-c++-common/gomp/target-has-device-addr-2.c: New test.
	* c-c++-common/gomp/target-is-device-ptr-1.c: New test.
	* c-c++-common/gomp/target-is-device-ptr-2.c: New test.
	* gfortran.dg/gomp/is_device_ptr-3.f90: New test.
	* gfortran.dg/gomp/target-has-device-addr-1.f90: New test.
	* gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
2022-02-09 23:47:12 -08:00
GCC Administrator 2a2fda2d9b Daily bump. 2022-02-09 00:16:24 +00:00
Jakub Jelinek 0af7ef050a libgomp: Fix segfault with posthumous orphan tasks [PR104385]
The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits.  Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed.  The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.

2022-02-08  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/104385
	* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
	clear task->parent.
	* testsuite/libgomp.c/pr104385.c: New test.
2022-02-08 09:30:17 +01:00
GCC Administrator 3c1cbde16e Daily bump. 2022-02-05 00:16:31 +00:00
Tobias Burnus f62156eab7 libgomp.fortran/allocate-1.f90: Fix minor cleanup
libgomp/ChangeLog:
	* testsuite/libgomp.fortran/allocate-1.f90: Remove spurious
	STOP of previous commit.
2022-02-04 17:31:21 +01:00
Tobias Burnus 6d49813501 libgomp.fortran/allocate-1.f90: Minor cleanup
libgomp/ChangeLog:
	* testsuite/libgomp.fortran/allocate-1.c (is_64bit_aligned): Renamed
	from is_64bit_aligned_.
	* testsuite/libgomp.fortran/allocate-1.f90: Fix interface decl
	and use it, more implicit none, remove unused argument.
2022-02-04 14:51:01 +01:00
GCC Administrator 682ede3959 Daily bump. 2022-02-04 00:16:24 +00:00
David Seifert 45ba6bf28b make `-Werror` optional in libatomic/libbacktrace/libgomp/libitm/libsanitizer
* `-Werror` can cause issues when a more recent version of GCC compiles
  an older version:
  - https://bugs.gentoo.org/229059
  - https://bugs.gentoo.org/475350
  - https://bugs.gentoo.org/667104

libatomic/ChangeLog:

	* configure.ac: Support --disable-werror.
	* configure: Regenerate.

libbacktrace/ChangeLog:

	* configure.ac: Support --disable-werror.
	* configure: Regenerate.

libgomp/ChangeLog:

	* configure.ac: Support --disable-werror.
	* configure: Regenerate.

libitm/ChangeLog:

	* configure.ac: Support --disable-werror.
	* configure: Regenerate.

libsanitizer/ChangeLog:

	* configure.ac: Support --disable-werror.
	* aclocal.m4: Include also ../config/warnings.m4.
	* libbacktrace/Makefile.am (WARN_FLAGS): Remove.
	* configure: Regenerate.
	* Makefile.in: Regenerate.
	* asan/Makefile.in: Regenerate.
	* hwasan/Makefile.in: Regenerate.
	* interception/Makefile.in: Regenerate.
	* libbacktrace/Makefile.in: Regenerate.
	* lsan/Makefile.in: Regenerate.
	* sanitizer_common/Makefile.in: Regenerate.
	* tsan/Makefile.in: Regenerate.
	* ubsan/Makefile.in: Regenerate.

Co-Authored-By: Jakub Jelinek <jakub@redhat.com>
2022-02-03 16:10:18 +01:00
GCC Administrator ae7e4af964 Daily bump. 2022-02-02 00:17:16 +00:00
Tom de Vries e0451f93d9 [nvptx] Add some support for .local atomics
The ptx insn atom doesn't support local memory.  In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.

The message is somewhat confusing given that actually the operation is not
supported on local address space.

Fix this by falling back on a non-atomic version when detecting
a frame-related memory operand.

This only solves some cases that are detected at compile-time.  It does
however fix the openacc private-atomic-* test-cases.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1")
	(define_insn "atomic_exchange<mode>")
	(define_insn "atomic_fetch_add<mode>")
	(define_insn "atomic_fetch_addsf")
	(define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version
	if memory operands is frame-relative.

gcc/testsuite/ChangeLog:

2022-01-31  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/stack-atomics-run.c: New test.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove
	PR83812 workaround.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
2022-02-01 19:28:24 +01:00
Tom de Vries d43fbc7d3f [libgomp, testsuite] Fix insufficient resources in test-cases
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
  'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
  recompile the program with 'num_workers = x and vector_length = y' on \
  that offloaded region or '-fopenacc-dim=y' where x * y <= 896.

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  \
  -O0  execution test
...

The error does not occur when using GOMP_NVPTX_JIT=-O0.

Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia.

Likewise for some other test-cases.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce
	num_workers for nvidia accelerator to fix libgomp error 'insufficient
	resources'.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	Same.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
2022-02-01 08:15:00 +01:00
Tom de Vries be362d5e12 [libgomp, testsuite] Reduce recursion depth in declare_target-*.f90
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx
accelerator (Nvidia T400, 2GB), I run into:
...
libgomp: cuCtxSynchronize error: unspecified launch failure \
  (perhaps abort was called)

libgomp: cuMemFree_v2 error: unspecified launch failure

libgomp: device finalization failed
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O0  execution test
...

The test-case contains:
...
  ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
  ! Nvidia Titan V.
  if (fib (23) /= fib_wrapper (23)) stop 2
...

Fix this by reducing the fib/fib_wrapper argument from 23 to 22.

Same for declare_target-2.f90.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
	recursion depth.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
2022-02-01 08:13:06 +01:00
GCC Administrator 1bb5266257 Daily bump. 2022-02-01 00:16:29 +00:00
Martin Liska c99a6eb015 Add mold detection for libs.
libatomic/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libgomp/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libitm/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libstdc++-v3/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.
2022-01-31 09:46:44 +01:00
GCC Administrator 99f17e996f Daily bump. 2022-01-28 00:16:32 +00:00
Tobias Burnus b2a0f3a454 libgomp.texi: Update OpenMP implementation status
libgomp/
	* libgomp.texi (OpenMP 5.0): Update implementation status.
2022-01-27 09:39:23 +01:00
GCC Administrator 9dd443578f Daily bump. 2022-01-22 00:16:26 +00:00
Thomas Schwinge 087e545747 Strengthen a few OpenACC test cases
Rather than rubber-stamp whatever requested vs. actual device kernel launch
configuration happens, actually (again) verify the requested values (modulo
expected variations).

This better highlights that "AMD GCN has an upper limit of 'num_workers(16)'",
and the deficiency that "AMD GCN uses the autovectorizer for the vector
dimension: the use of a function call in vector-partitioned code [...] is not
currently supported".

And, this removes several instances of race conditions, where variables are
concurrently written to in OpenACC gang-redundant mode.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Strengthen.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
2022-01-21 18:45:30 +01:00
GCC Administrator fe1ad14165 Daily bump. 2022-01-20 00:16:54 +00:00
Marcel Vollweiler 0bd247bbbe libgomp, OpenMP: Fix issue for omp_get_device_num on gcn targets.
Currently omp_get_device_num does not work on gcn targets with more than one
offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
icv-device.c and thus "__gomp_device_num" is not visible in the offload image.

This patch removes "static" such that "__gomp_device_num" is now part of the
offload image and can now be found in GOMP_OFFLOAD_load_image in the plugin.

This is not an issue for nvptx. There, "__gomp_device_num" is in the offload
image even with "static".

libgomp/ChangeLog:

	* config/gcn/icv-device.c: Make GOMP_DEVICE_NUM_VAR public (remove
	"static") to make the device num available in the offload image.
2022-01-19 05:03:54 -08:00
Martin Liska 2aea19bdb1 nvptx: update fix for -Wformat-diag
gcc/ChangeLog:

	* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Update
	warning messages.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update scanning
	patterns.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-01-19 08:27:00 +01:00
GCC Administrator 7a761ae658 Daily bump. 2022-01-19 00:16:32 +00:00
Martin Liska b1f3640912 nvptx: fix -Wformat-diag warnings
gcc/ChangeLog:

	* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Wrap
	keyword.
	* config/nvptx/nvptx.md: Remove trailing dot.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update keyword
	in dg-warning.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
2022-01-18 17:25:36 +01:00
GCC Administrator fc82978278 Daily bump. 2022-01-18 00:16:54 +00:00
Thomas Schwinge b75aab194e Extend test cases for references in OpenACC 'private' clauses
libgomp/
	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Extend.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
2022-01-17 08:57:27 +01:00
Julian Brown fbb438808e Test cases for references in OpenACC 'private' clauses
libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-01-17 08:57:20 +01:00
GCC Administrator 1e942d7c05 Daily bump. 2022-01-17 00:16:24 +00:00
Kwok Cheung Yeung a78b1ab1df amdgcn: Tune default OpenMP/OpenACC GPU utilization
libgomp/
	* plugin/plugin-gcn.c (parse_target_attributes): Automatically set
	the number of teams and threads if necessary.
	(gcn_exec): Automatically set the number of gangs and workers if
	necessary.

Co-Authored-By: Andrew Stubbs  <ams@codesourcery.com>
2022-01-16 17:25:36 +01:00
GCC Administrator ad3f0d0806 Daily bump. 2022-01-14 00:16:30 +00:00
Hafiz Abid Qadeer 69561fc781 Add support for allocate clause (OpenMP 5.0).
This patch adds support for OpenMP 5.0 allocate clause for fortran. It does not
yet support the allocator-modifier as specified in OpenMP 5.1. The allocate
clause is already supported in C/C++.

gcc/fortran/ChangeLog:

	* dump-parse-tree.c (show_omp_clauses): Handle OMP_LIST_ALLOCATE.
	* gfortran.h (OMP_LIST_ALLOCATE): New enum value.
	* openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATE.
	(gfc_match_omp_clauses): Handle OMP_CLAUSE_ALLOCATE
	(OMP_PARALLEL_CLAUSES, OMP_DO_CLAUSES, OMP_SECTIONS_CLAUSES)
	(OMP_TASK_CLAUSES, OMP_TASKLOOP_CLAUSES, OMP_TARGET_CLAUSES)
	(OMP_TEAMS_CLAUSES, OMP_DISTRIBUTE_CLAUSES)
	(OMP_SINGLE_CLAUSES): Add OMP_CLAUSE_ALLOCATE.
	(OMP_TASKGROUP_CLAUSES): New.
	(gfc_match_omp_taskgroup): Use OMP_TASKGROUP_CLAUSES instead of
	OMP_CLAUSE_TASK_REDUCTION.
	(resolve_omp_clauses): Handle OMP_LIST_ALLOCATE.
	(resolve_omp_do): Avoid warning when loop iteration variable is
	in allocate clause.
	* trans-openmp.c (gfc_trans_omp_clauses): Handle translation of
	allocate clause.
	(gfc_split_omp_clauses): Update for OMP_LIST_ALLOCATE.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-1.f90: New test.
	* gfortran.dg/gomp/allocate-2.f90: New test.
	* gfortran.dg/gomp/allocate-3.f90: New test.
	* gfortran.dg/gomp/collapse1.f90: Update error message.
	* gfortran.dg/gomp/openmp-simd-4.f90: Likewise.
	* gfortran.dg/gomp/clauses-1.f90: Uncomment allocate clause.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/allocate-1.c: New test.
	* testsuite/libgomp.fortran/allocate-1.f90: New test.
	* libgomp.texi: Remove string that says that allocate clause
	support is for C/C++ only.
2022-01-13 18:57:05 +00:00
Thomas Schwinge d97364aab1 Improve Intel MIC offloading XFAILing for 'omp_get_device_num'
After recent commit be661959a6
"libgomp/testsuite: Improve omp_get_device_num() tests", we're now iterating
over all OpenMP target devices.  Intel MIC (emulated) offloading still doesn't
properly implement device-side 'omp_get_device_num', and we thus regress:

    PASS: libgomp.c/../libgomp.c-c++-common/target-45.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/target-45.c execution test

    PASS: libgomp.c++/../libgomp.c-c++-common/target-45.c (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/target-45.c execution test

    PASS: libgomp.fortran/target10.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O0  execution test
    PASS: libgomp.fortran/target10.f90   -O1  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O1  execution test
    PASS: libgomp.fortran/target10.f90   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O2  execution test
    PASS: libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: libgomp.fortran/target10.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -g  execution test
    PASS: libgomp.fortran/target10.f90   -Os  (test for excess errors)
    [-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -Os  execution test

Improve the XFAILing added in commit bb75b22aba
"Allow matching Intel MIC in OpenMP 'declare variant'" for the case that *any*
Intel MIC offload device is available.

	libgomp/
	* testsuite/libgomp.c-c++-common/on_device_arch.h
	(any_device_arch, any_device_arch_intel_mic): New.
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_any_intel_mic): New.
	* testsuite/libgomp.c-c++-common/target-45.c: Use it.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
2022-01-13 13:09:36 +01:00
Thomas Schwinge 2edbcaed95 Document current '-Wuninitialized' diagnostics for 'libgomp.oacc-fortran/routine-10.f90' [PR102192]
libgomp/
	PR tree-optimization/102192
	* testsuite/libgomp.oacc-fortran/routine-10.f90: Document current
	'-Wuninitialized' diagnostics.
2022-01-13 11:52:35 +01:00
Thomas Schwinge 4bd8b1e881 Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases
... including "note: '[...]' was declared here" emitted since recent
commit 9695e1c23b
"Improve -Wuninitialized note location".

For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es,
including one more instance of PR77504 etc., and several instances where
for "local variables" of reference-data-type reductions (etc.?) we emit
bogus (?) diagnostics.

For implicit data clauses (including 'firstprivate'), we seem to be missing
diagnostics, so I've placed XFAILed 'dg-warning's.

	gcc/testsuite/
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document
	current '-Wuninitialized' diagnostics.
	* c-c++-common/goacc/mdc-1.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-routine.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise.
	* c-c++-common/goacc/uninit-if-clause.c: Likewise.
	* gfortran.dg/goacc/array-with-dt-1.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-2.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-1.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-3.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-4.f90: Likewise.
	* gfortran.dg/goacc/derived-classtypes-1.f95: Likewise.
	* gfortran.dg/goacc/derived-types-2.f90: Likewise.
	* gfortran.dg/goacc/host_data-tree.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/modules.f95: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/pr93464.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
	* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-if-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise.
	* gfortran.dg/goacc/wait.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document
	current '-Wuninitialized' diagnostics.
	* testsuite/libgomp.oacc-fortran/data-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reference-reductions.f90:
	Likewise.
2022-01-13 11:52:35 +01:00
Thomas Schwinge 9fcc3a1dd2 Host and offload targets have no common meaning of address spaces
gcc/
	* tree-streamer-out.c (pack_ts_base_value_fields): Don't pack
	'TYPE_ADDR_SPACE' for offloading.
	* tree-streamer-in.c (unpack_ts_base_value_fields): Don't unpack
	'TYPE_ADDR_SPACE' for offloading.
	libgomp/
	* testsuite/libgomp.c/address-space-1.c: Remove 'dg-xfail-run-if'
	for 'offload_device_intel_mic'.
2022-01-13 11:16:20 +01:00
Julian Brown e52253bcc0 Wait at end of OpenACC asynchronous kernels regions
In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and
asynchronous data and compute regions, giving rise to data races when the
asynchronicity is actually executed, as is visible in at least on test case
with GCN offloading.

The proper fix is to correctly use the asynchronous interfaces, making the
currently synchronous data regions fully asynchronous (see also
<https://gcc.gnu.org/PR97390> "[OpenACC] 'async' clause on 'data' construct",
which is to share the same implementation), but that's for later; for now add
some more synchronization.

	gcc/
	* omp-oacc-kernels-decompose.cc (add_wait): New function, split out
	of...
	(add_async_clauses_and_wait): ...here. Call new outlined function.
	(decompose_kernels_region_body): Add wait at the end of
	explicitly-asynchronous kernels regions.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN
	offloading execution XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2022-01-13 10:42:17 +01:00
Thomas Schwinge 9b32c1669a OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]
... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:

    13073			else if (is_gimple_reg (var))
    13074			  {
    13075			    gcc_assert (offloaded);
    13076			    tree avar = create_tmp_var (TREE_TYPE (var));
    13077			    mark_addressable (avar);

..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
to make work, so let's just avoid this case.

	gcc/
	PR middle-end/100280
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Mark variables used in synthesized data clauses as addressable.
	gcc/testsuite/
	PR middle-end/100280
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
	* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
	Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
	'--param openacc-kernels=decompose'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Update.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
	Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
	'--param openacc-kernels=decompose'.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	libgomp/
	PR middle-end/100280
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.

Suggested-by: Julian Brown <julian@codesourcery.com>
2022-01-13 10:42:17 +01:00
Thomas Schwinge 862e5f398b Enhance OpenACC 'kernels' decomposition testing
gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: Enhance.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
2022-01-13 10:42:17 +01:00
GCC Administrator 7d11b64b18 Daily bump. 2022-01-05 00:16:52 +00:00
Tobias Burnus be661959a6 libgomp/testsuite: Improve omp_get_device_num() tests
Related to r12-6208-gebc853deb7cc0487de9ef6e891a007ba853d1933
"libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load"

That commit fixed an issue with omp_get_device_num() on gcn/nvptx that
resulted in having always the value 0.
This commit modifies the tests to iterate over all devices such that on a
multi-nonhost-device system it had detected that always-zero issue.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/target-45.c: Iterate over all devices.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
2022-01-04 14:58:06 +01:00
Chung-Lin Tang fbb592407c libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load
In the patch that implemented omp_get_device_num(), there was an error where
the stringification of GOMP_DEVICE_NUM_VAR, which is the macro expanding to
the actual symbol used, was erroneously using the STRINGX() macro in the
libgomp offload image symbol search, and expansion of the variable name
string through the additional layer of preprocessor symbol was not properly
achieved.

This patch fixes this by changing to properly use XSTRING(), also from
include/symcat.h.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Change uses of STRINGX
	into XSTRING when looking for GOMP_DEVICE_NUM_VAR in offload image.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
2022-01-04 17:26:23 +08:00
GCC Administrator a4ae8c3701 Daily bump. 2022-01-04 00:16:40 +00:00
Jakub Jelinek 7adcbafe45 Update copyright years. 2022-01-03 10:42:10 +01:00
Jakub Jelinek 877e3c2abf Update Copyright in ChangeLog files
Do this separately from all other Copyright updates, as ChangeLog files
can be modified only separately.
2022-01-03 10:31:39 +01:00
Jakub Jelinek abc1ac2d8d Update copyright dates.
Manual part of copyright year updates.

2022-01-03  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* gcc.c (process_command): Update copyright notice dates.
	* gcov-dump.c (print_version): Ditto.
	* gcov.c (print_version): Ditto.
	* gcov-tool.c (print_version): Ditto.
	* gengtype.c (create_file): Ditto.
	* doc/cpp.texi: Bump @copying's copyright year.
	* doc/cppinternals.texi: Ditto.
	* doc/gcc.texi: Ditto.
	* doc/gccint.texi: Ditto.
	* doc/gcov.texi: Ditto.
	* doc/install.texi: Ditto.
	* doc/invoke.texi: Ditto.
gcc/ada/
	* gnat_ugn.texi: Bump @copying's copyright year.
	* gnat_rm.texi: Likewise.
gcc/d/
	* gdc.texi: Bump @copyrights-d year.
gcc/fortran/
	* gfortranspec.c (lang_specific_driver): Update copyright notice
	dates.
	* gfc-internals.texi: Bump @copying's copyright year.
	* gfortran.texi: Ditto.
	* intrinsic.texi: Ditto.
	* invoke.texi: Ditto.
gcc/go/
	* gccgo.texi: Bump @copyrights-go year.
libgomp/
	* libgomp.texi: Bump @copying's copyright year.
libitm/
	* libitm.texi: Bump @copying's copyright year.
libquadmath/
	* libquadmath.texi: Bump @copying's copyright year.
2022-01-03 10:27:43 +01:00
GCC Administrator 7f1239cb43 Daily bump. 2021-12-14 00:16:25 +00:00
Tobias Burnus 494ebfa7c9 Fortran: Handle compare in OpenMP atomic
gcc/fortran/ChangeLog:

	PR fortran/103576
	* openmp.c (is_scalar_intrinsic_expr): Fix condition.
	(resolve_omp_atomic): Fix/update checks, accept compare.
	* trans-openmp.c (gfc_trans_omp_atomic): Handle compare.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.1): Set Fortran support for atomic to 'Y'.
	* testsuite/libgomp.fortran/atomic-19.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/atomic-25.f90: Remove sorry, fix + add checks.
	* gfortran.dg/gomp/atomic-26.f90: Likewise.
	* gfortran.dg/gomp/atomic-21.f90: New test.
2021-12-13 12:38:26 +01:00
GCC Administrator 0bceef1671 Daily bump. 2021-12-11 00:16:30 +00:00
Andrew Stubbs 4a87a8e4b1 amdgcn: Change offload variable table discovery
Up to now the libgomp GCN plugin has been finding the offload variables
by using a symbol lookup, but the AMD runtime requires that the symbols are
global for that to work. This was ensured by mkoffload as a post-procssing
step, but the LLVM 13 assembler no longer accepts this in the case where the
variable was previously declared differently.

This patch switches to locating the symbols directly from the
offload_var_table, which means that only one symbol needs to be forced
global.

This changes breaks the libgomp image compatibility so GOMP_VERSION_GCN has
also been bumped.

gcc/ChangeLog:

	* config/gcn/mkoffload.c (process_asm): Process the variable table
	completely differently.
	(process_obj): Encode the varaible data differently.

include/ChangeLog:

	* gomp-constants.h (GOMP_VERSION_GCN): Bump.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (struct gcn_image_desc): Remove global_variables.
	(GOMP_OFFLOAD_load_image): Locate the offload variables via the
	table, not individual symbols.
2021-12-10 10:45:09 +00:00
GCC Administrator 4b4839e325 Daily bump. 2021-12-10 00:16:29 +00:00
Chung-Lin Tang 2766448c5c openmp: Fix libgomp.c++ testsuite errors for non-offload configs
Some testcases for libgomp.c++ only works for non-shared address space offloading,
because it exercises the zero-length array section behavior for offloaded
address space, testing for NULL/non-NULL cases.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/target-lambda-1.C: Only run under
	"target offload_device_nonshared_as"
	* testsuite/libgomp.c++/target-this-3.C: Likewise.
	* testsuite/libgomp.c++/target-this-4.C: Likewise.
2021-12-10 00:39:03 +08:00
GCC Administrator 641ff2196f Daily bump. 2021-12-09 00:16:31 +00:00
Chung-Lin Tang 6c0399378e OpenMP 5.0: Remove array section base-pointer mapping semantics and other front-end adjustments
This patch implements three pieces of functionality:

(1) Adjust array section mapping to have standards conforming behavior,
mapping array sections should *NOT* also map the base-pointer:

struct S { int *ptr; ... };
struct S s;

Instead of generating this during gimplify:
                              map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])

Now, adjust to:

(i.e. do not map the base-pointer together. The attach operation is still
generated, and if s.ptr is already mapped prior, attachment will happen)

The correct way of achieving the base-pointer-also-mapped behavior would be to
use:

(A small Fortran front-end patch to trans-openmp.c:gfc_trans_omp_array_section
 is also included, which removes generation of a GOMP_MAP_ALWAYS_POINTER for
 array types, which appears incorrect and causes a regression in
 libgomp.fortranlibgomp.fortran/struct-elem-map-1.f90)

(2) Related to the first item above, are fixes in libgomp/target.c to not
overwrite attached pointers when handling device<->host copies, mainly for the
"always" case.

(3) The third is a set of changes to the C/C++ front-ends to extend the allowed
component access syntax in map clauses. These changes are enabled for both
OpenACC and OpenMP.

gcc/c/ChangeLog:

	* c-parser.c (struct omp_dim): New struct type for use inside
	c_parser_omp_variable_list.
	(c_parser_omp_variable_list): Allow multiple levels of array and
	component accesses in array section base-pointer expression.
	(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
	c_parser_omp_var_list_parens.
	(c_parser_omp_clause_from): Likewise.
	* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
	of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
	POINTER_PLUS_EXPR.
	(c_finish_omp_clauses): Extend allowed ranged of expressions
	involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/cp/ChangeLog:

	* parser.c (struct omp_dim): New struct type for use inside
	cp_parser_omp_var_list_no_open.
	(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
	component accesses in array section base-pointer expression.
	(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
	cp_parser_omp_var_list for to/from clauses.
	* semantics.c (handle_omp_array_sections_1): Extend allowed range
	of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
	POINTER_PLUS_EXPR.
	(handle_omp_array_sections): Adjust pointer map generation of
	references.
	(finish_omp_clauses): Extend allowed ranged of expressions
	involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.

gcc/fortran/ChangeLog:

	* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
	GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.

gcc/ChangeLog:

	* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
	accomodate case where 'offset' return of get_inner_reference is
	non-NULL.
	(is_or_contains_p): Further robustify conditions.
	(omp_target_reorder_clauses): In alloc/to/from sorting phase, also
	move following GOMP_MAP_ALWAYS_POINTER maps along.  Add new sorting
	phase where we make sure pointers with an attach/detach map are ordered
	correctly.
	(gimplify_scan_omp_clauses): Add modifications to avoid creating
	GOMP_MAP_STRUCT and associated alloc map for attach/detach maps.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase.
	* c-c++-common/gomp/target-enter-data-1.c: New testcase.
	* c-c++-common/gomp/target-implicit-map-2.c: New testcase.

libgomp/ChangeLog:

	* target.c (gomp_map_vars_existing): Make sure attached pointer is
	not overwritten during cross-host/device copying.
	(gomp_update): Likewise.
	(gomp_exit_data): Likewise.
	* testsuite/libgomp.c++/target-11.C: Adjust testcase.
	* testsuite/libgomp.c++/target-12.C: Likewise.
	* testsuite/libgomp.c++/target-15.C: Likewise.
	* testsuite/libgomp.c++/target-16.C: Likewise.
	* testsuite/libgomp.c++/target-17.C: Likewise.
	* testsuite/libgomp.c++/target-21.C: Likewise.
	* testsuite/libgomp.c++/target-23.C: Likewise.
	* testsuite/libgomp.c/target-23.c: Likewise.
	* testsuite/libgomp.c/target-29.c: Likewise.
	* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.
2021-12-09 00:01:10 +08:00
Chung-Lin Tang 0ab29cf0bb openmp: Improve OpenMP target support for C++ (PR92120)
This patch implements several C++ specific mapping capabilities introduced for
OpenMP 5.0, including implicit mapping of this[:1] for non-static member
functions, zero-length array section mapping of pointer-typed members,
lambda captured variable access in target regions, and use of lambda objects
inside target regions.

Several adjustments to the C/C++ front-ends to allow more member-access syntax
as valid is also included.

	PR middle-end/92120

gcc/cp/ChangeLog:

	* cp-tree.h (finish_omp_target): New declaration.
	(finish_omp_target_clauses): Likewise.
	* parser.c (cp_parser_omp_clause_map): Adjust call to
	cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
	(cp_parser_omp_target): Factor out code, adjust into calls to new
	function finish_omp_target.
	* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
	OMP_TARGET case.
	* semantics.c (handle_omp_array_sections_1): Add handling to create
	'this->member' from 'member' FIELD_DECL. Remove case of rejecting
	'this' when not in declare simd.
	(handle_omp_array_sections): Likewise.
	(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
	map clauses. Handle 'A->member' case in map clauses. Remove case of
	rejecting 'this' when not in declare simd.
	(struct omp_target_walk_data): New struct for walking over
	target-directive tree body.
	(finish_omp_target_clauses_r): New function for tree walk.
	(finish_omp_target_clauses): New function.
	(finish_omp_target): New function.

gcc/c/ChangeLog:

	* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
	call to c_parser_omp_variable_list to 'true'.
	* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
	array base handling.
	(c_finish_omp_clauses): Handle 'A->member' case in map clauses.

gcc/ChangeLog:

	* gimplify.c ("tree-hash-traits.h"): Add include.
	(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
	hash_map<tree_operand, tree> *. Adjust struct map handling to handle
	cases of *A and A->B expressions. Under !DECL_P case of
	GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
	struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
	handling component_ref_p case. Add unshare_expr and gimplification
	when created GOMP_MAP_STRUCT is not a DECL. Add code to add
	firstprivate pointer for *pointer-to-struct case.
	(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
	exit data directives code to earlier position.
	* omp-low.c (lower_omp_target):
	Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
	GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
	* tree-pretty-print.c (dump_omp_clause): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.dg/gomp/target-3.c: New testcase.
	* g++.dg/gomp/target-3.C: New testcase.
	* g++.dg/gomp/target-lambda-1.C: New testcase.
	* g++.dg/gomp/target-lambda-2.C: New testcase.
	* g++.dg/gomp/target-this-1.C: New testcase.
	* g++.dg/gomp/target-this-2.C: New testcase.
	* g++.dg/gomp/target-this-3.C: New testcase.
	* g++.dg/gomp/target-this-4.C: New testcase.
	* g++.dg/gomp/target-this-5.C: New testcase.
	* g++.dg/gomp/this-2.C: Adjust testcase.

include/ChangeLog:

	* gomp-constants.h (enum gomp_map_kind):
	Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
	GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
	(GOMP_MAP_POINTER_P):
	Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.

libgomp/ChangeLog:

	* libgomp.h (gomp_attach_pointer): Add bool parameter.
	* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
	(goacc_enter_data_internal): Likewise.
	* target.c (gomp_map_vars_existing): Update assert condition to
	include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
	(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
	parameter, add support for mapping a pointer with NULL target.
	(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
	parameter, add support for attaching a pointer with NULL target.
	(gomp_map_vars_internal): Update calls to gomp_map_pointer and
	gomp_attach_pointer, add handling for
	GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
	GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
	* testsuite/libgomp.c++/target-23.C: New testcase.
	* testsuite/libgomp.c++/target-lambda-1.C: New testcase.
	* testsuite/libgomp.c++/target-lambda-2.C: New testcase.
	* testsuite/libgomp.c++/target-this-1.C: New testcase.
	* testsuite/libgomp.c++/target-this-2.C: New testcase.
	* testsuite/libgomp.c++/target-this-3.C: New testcase.
	* testsuite/libgomp.c++/target-this-4.C: New testcase.
	* testsuite/libgomp.c++/target-this-5.C: New testcase.
2021-12-08 22:29:06 +08:00
GCC Administrator 70e4cb66c1 Daily bump. 2021-12-05 00:16:28 +00:00
Tobias Burnus 689407ef91 Fortran/OpenMP: Support most of 5.1 atomic extensions
Implements moste of OpenMP 5.1 atomic extensions,
except that 'compare' is parsed but rejected during
resolution. (As the trans-openmp.c handling is missing.)

gcc/fortran/ChangeLog:

	* dump-parse-tree.c (show_omp_clauses): Handle
	weak/compare/fail clause.
	* gfortran.h (gfc_omp_clauses): Add weak, compare, fail.
	* openmp.c (enum omp_mask1, gfc_match_omp_clauses,
	OMP_ATOMIC_CLAUSES): Update for new clauses.
	(gfc_match_omp_atomic): Update for 5.1 atomic changes.
	(is_conversion): Support widening in one go.
	(is_scalar_intrinsic_expr): New.
	(resolve_omp_atomic): Update for 5.1 atomic changes.
	* parse.c (parse_omp_oacc_atomic): Update for compare.
	* resolve.c (gfc_resolve_blocks): Update asserts.
	* trans-openmp.c (gfc_trans_omp_atomic): Handle new clauses.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/atomic-2.f90: Move now supported code to ...
	* gfortran.dg/gomp/atomic.f90: here.
	* gfortran.dg/gomp/atomic-10.f90: New test.
	* gfortran.dg/gomp/atomic-12.f90: New test.
	* gfortran.dg/gomp/atomic-15.f90: New test.
	* gfortran.dg/gomp/atomic-16.f90: New test.
	* gfortran.dg/gomp/atomic-17.f90: New test.
	* gfortran.dg/gomp/atomic-18.f90: New test.
	* gfortran.dg/gomp/atomic-19.f90: New test.
	* gfortran.dg/gomp/atomic-20.f90: New test.
	* gfortran.dg/gomp/atomic-22.f90: New test.
	* gfortran.dg/gomp/atomic-24.f90: New test.
	* gfortran.dg/gomp/atomic-25.f90: New test.
	* gfortran.dg/gomp/atomic-26.f90: New test.

libgomp/ChangeLog

	* libgomp.texi (OpenMP 5.1): Update status.
2021-12-04 19:43:46 +01:00
Tobias Burnus b09af56214 libgomp.texi: Update OMP_PLACES
libgomp/ChangeLog:

	* libgomp.texi (OMP_PLACES): Extend description for OMP 5.1 changes.
2021-12-04 13:28:03 +01:00
GCC Administrator ea6ef320b0 Daily bump. 2021-12-03 00:17:04 +00:00
Chung-Lin Tang 1ac7a8c9e4 fortran: OpenMP/OpenACC array mapping alignment fix (PR90030)
Fix issue with the Fortran front-end when mapping arrays: when creating the
data MEM_REF for the map clause, there was a convention of casting the
referencing pointer to 'c_char *' by
fold_convert (build_pointer_type (char_type_node), ptr).

This causes the alignment passed to the libgomp runtime for array data
hardwared to '1', and causes alignment errors on the offload target.

This patch fixes this by removing the char_type_node pointer converts, and
adding gcc_asserts to ensure POINTER_TYPE_P (TREE_TYPE (ptr)).

	PR fortran/90030

gcc/fortran/ChangeLog:

	* trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer
	to char_type_node, add gcc_assert of POINTER_TYPE_P.
	(gfc_trans_omp_array_section): Likewise.
	(gfc_trans_omp_clauses): Likewise.

gcc/testsuite/ChangeLog:

	* gfortran.dg/goacc/finalize-1.f: Adjust scan test.
	* gfortran.dg/gomp/affinity-clause-1.f90: Likewise.
	* gfortran.dg/gomp/affinity-clause-5.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
	* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
	* gfortran.dg/gomp/map-3.f90: Likewise.
	* gfortran.dg/gomp/pr78260-2.f90: Likewise.
	* gfortran.dg/gomp/pr78260-3.f90: Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-fortran/pr90030.f90: New test.
	* testsuite/libgomp.fortran/pr90030.f90: New test.
2021-12-02 18:27:16 +08:00
GCC Administrator c177e80609 Daily bump. 2021-12-01 00:17:04 +00:00
Kwok Cheung Yeung f1a58ab0db [OpenACC] Allow gang reductions inside serial constructs
... fixing a regression introduced in the preceding
commit 2b7dac2c0d
"Make OpenACC orphan gang reductions errors".

	gcc/fortran/
	* openmp.c (oacc_is_serial, oacc_is_parallel_or_serial): New.
	(resolve_oacc_loop_blocks): Use oacc_is_parallel_or_serial instead of
	oacc_is_parallel.
	libgomp/
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Remove
	temporary skip.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-11-30 12:58:53 +01:00
Cesar Philippidis 2b7dac2c0d Make OpenACC orphan gang reductions errors
This patch promotes all OpenACC gang reductions on orphan loops as
errors. Accord to the spec, orphan loops are those which are not
lexically nested inside an OpenACC parallel or kernels regions. I.e.,
acc loops inside acc routines.

At first I thought this could be a warning because the gang reduction
finalizer uses an atomic update. However, because there is no
synchronization between gangs, there is way to guarantee that reduction
will have completed once a single gang entity returns from the acc
routine call.

	gcc/c/
	* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan
	OpenACC gang reductions.
	gcc/cp/
	* semantics.c (finish_omp_clauses): Emit an error on orphan
	OpenACC gang reductions.
	gcc/fortran/
	* openmp.c (oacc_is_parallel, oacc_is_kernels): New 'static'
	functions.
	(resolve_oacc_loop_blocks): Emit an error on orphan OpenACC gang
	reductions.
	gcc/
	* omp-general.h (enum oacc_loop_flags): Add OLF_REDUCTION enum.
	* omp-low.c (lower_oacc_head_mark): Use it to mark OpenACC
	reductions.
	* omp-offload.c (oacc_loop_auto_partitions): Don't assign gang
	level parallelism to orphan reductions.
	gcc/testsuite/
	* c-c++-common/goacc/nested-reductions-1-routine.c: Adjust.
	* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
	* gcc.dg/goacc/loop-processing-1.c: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
	* c-c++-common/goacc/orphan-reductions-1.c: New test.
	* c-c++-common/goacc/orphan-reductions-2.c: New test.
	* gfortran.dg/goacc/orphan-reductions-1.f90: New test.
	* gfortran.dg/goacc/orphan-reductions-2.f90: New test.
	libgomp/
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Temporarily
	skip.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-11-30 12:58:45 +01:00
GCC Administrator 87cd82c81d Daily bump. 2021-11-30 00:16:44 +00:00
Richard Biener 16507dea75 Remove unreachable returns
This removes unreachable return statements as diagnosed by
the -Wunreachable-code patch.  Some cases are more obviously
an improvement than others - in fact some may get you the idea
to replace them with gcc_unreachable () instead, leading to
cases of the 'Remove unreachable gcc_unreachable () at the end
of functions' patch.

2021-11-25  Richard Biener  <rguenther@suse.de>

	* vec.c (qsort_chk): Do not return the void return value
	from the noreturn qsort_chk_error.
	* ccmp.c (expand_ccmp_expr_1): Remove unreachable return.
	* df-scan.c (df_ref_equal_p): Likewise.
	* dwarf2out.c (is_base_type): Likewise.
	(add_const_value_attribute): Likewise.
	* fixed-value.c (fixed_arithmetic): Likewise.
	* gimple-fold.c (gimple_fold_builtin_fputs): Likewise.
	* gimple-ssa-strength-reduction.c (stmt_cost): Likewise.
	* graphite-isl-ast-to-gimple.c
	(gcc_expression_from_isl_expr_op): Likewise.
	(gcc_expression_from_isl_expression): Likewise.
	* ipa-fnsummary.c (will_be_nonconstant_expr_predicate):
	Likewise.
	* lto-streamer-in.c (lto_input_mode_table): Likewise.

gcc/c-family/
	* c-opts.c (c_common_post_options): Remove unreachable return.
	* c-pragma.c (handle_pragma_target): Likewise.
	(handle_pragma_optimize): Likewise.

gcc/c/
	* c-typeck.c (c_tree_equal): Remove unreachable return.
	* c-parser.c (get_matching_symbol): Likewise.

libgomp/
	* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable
	return.
2021-11-29 11:17:22 +01:00
GCC Administrator d9ca4b45bd Daily bump. 2021-11-25 00:16:29 +00:00
Jakub Jelinek 5bca26742c openmp: Fix up handling of kind(host) and kind(nohost) in ACCEL_COMPILERs [PR103384]
As the testcase shows, we weren't handling kind(host) and kind(nohost) properly
in the ACCEL_COMPILERs, the code written in there is valid for the host
compiler only, where if we are maybe offloaded, we defer resolution after IPA,
otherwise return 0 for kind(nohost) and accept it for kind(host).  Note,
omp_maybe_offloaded is false after IPA.  If ACCEL_COMPILER is defined, it is
the other way around, but also we know we are after IPA.

2021-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/103384
gcc/
	* omp-general.c (omp_context_selector_matches): For ACCEL_COMPILER,
	return 0 for kind(host) and continue for kind(nohost).
libgomp/
	* testsuite/libgomp.c/declare-variant-2.c: New test.
2021-11-24 10:30:32 +01:00
GCC Administrator 483092d3d9 Daily bump. 2021-11-19 00:16:34 +00:00
David Edelsohn 1b2b930152 Fix typo.
libgomp/ChangeLog:

	* alloc.c (gomp_aligned_alloc): Fix typo.
2021-11-18 10:26:55 -05:00
Jakub Jelinek 17da2c7425 libgomp: Ensure that either gomp_team is properly aligned [PR102838]
struct gomp_team has struct gomp_work_share array inside of it.
If that latter structure has 64-byte aligned member in the middle,
the whole struct gomp_team needs to be 64-byte aligned, but we weren't
allocating it using gomp_aligned_alloc.

This patch fixes that, except that on gcn team_malloc is special, so
I've instead decided at least for now to avoid using aligned member
and use the padding instead on gcn.

2021-11-18  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/102838
	* libgomp.h (GOMP_USE_ALIGNED_WORK_SHARES): Define if
	GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined and __AMDGCN__ is not.
	(struct gomp_work_share): Use GOMP_USE_ALIGNED_WORK_SHARES instead of
	GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC.
	* work.c (alloc_work_share, gomp_work_share_start): Likewise.
	* team.c (gomp_new_team): If GOMP_USE_ALIGNED_WORK_SHARES, use
	gomp_aligned_alloc instead of team_malloc.
2021-11-18 09:10:40 +01:00
Jakub Jelinek 7a2aa63fad libgomp: Fix up aligned_alloc arguments [PR102838]
C says that aligned_alloc size must be an integral multiple of alignment.
While glibc doesn't care about it, apparently Solaris does.
So, this patch decreases the priority of aligned_alloc among the other
variants because it needs more work and can waste more memory and rounds
up the size to multiple of alignment.

2021-11-18  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/102838
	* alloc.c (gomp_aligned_alloc): Prefer _aligned_alloc over
	memalign over posix_memalign over aligned_alloc over fallback
	with malloc instead of aligned_alloc over _aligned_alloc over
	posix_memalign over memalign over fallback with malloc.  For
	aligned_alloc, round up size up to multiple of al.
2021-11-18 09:07:31 +01:00
GCC Administrator 6b1695f4a0 Daily bump. 2021-11-17 00:16:29 +00:00
Jakub Jelinek 9ceaf0fee3 libgomp: Mark thread_limit clause to target construct as implemented
After the Fortran changes we can mark it as implemented...

2021-11-16  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (OpenMP 5.1): Mark thread_limit clause to target
	construct as implemented.
2021-11-16 10:21:56 +01:00
GCC Administrator e2b57363fc Daily bump. 2021-11-16 00:16:31 +00:00
Tobias Burnus 82ec4cb3c4 Fortran: openmp: Add support for thread_limit clause on target
gcc/fortran/ChangeLog:

	* openmp.c (OMP_TARGET_CLAUSES): Add thread_limit.
	* trans-openmp.c (gfc_split_omp_clauses): Add thread_limit also to
	teams.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/thread-limit-1.f90: New test.
2021-11-15 15:44:11 +01:00
Jakub Jelinek aea7238683 openmp: Add support for thread_limit clause on target
OpenMP 5.1 says that thread_limit clause can also appear on target,
and similarly to teams should affect the thread-limit-var ICV.
On combined target teams, the clause goes to both.

We actually passed thread_limit internally on target already before,
but only used it for gcn/ptx offloading to hint how many threads should be
created and for ptx didn't set thread_limit_var in that case.
Similarly for host fallback.
Also, I found that we weren't copying the args array that contains encoded
thread_limit and num_teams clause for target (etc.) for async target.

2021-11-15  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* gimplify.c (optimize_target_teams): Only add OMP_CLAUSE_THREAD_LIMIT
	to OMP_TARGET_CLAUSES if it isn't there already.
gcc/c-family/
	* c-omp.c (c_omp_split_clauses) <case OMP_CLAUSE_THREAD_LIMIT>:
	Duplicate to both OMP_TARGET and OMP_TEAMS.
gcc/c/
	* c-parser.c (OMP_TARGET_CLAUSE_MASK): Add
	PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
gcc/cp/
	* parser.c (OMP_TARGET_CLAUSE_MASK): Add
	PRAGMA_OMP_CLAUSE_THREAD_LIMIT.
libgomp/
	* task.c (gomp_create_target_task): Copy args array as well.
	* target.c (gomp_target_fallback): Add args argument.
	Set gomp_icv (true)->thread_limit_var if thread_limit is present.
	(GOMP_target): Adjust gomp_target_fallback caller.
	(GOMP_target_ext): Likewise.
	(gomp_target_task_fn): Likewise.
	* config/nvptx/team.c (gomp_nvptx_main): Set
	gomp_global_icv.thread_limit_var.
	* testsuite/libgomp.c-c++-common/thread-limit-1.c: New test.
2021-11-15 13:20:53 +01:00
Jakub Jelinek 9fa72756d9 libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound
Here is a PTX implementation of what I was talking about, that for
num_teams_upper 0 or whenever num_teams_lower <= num_blocks, the current
implementation is fine but if the user explicitly asks for more
teams than we can provide in hardware, we need to stop assuming that
omp_get_team_num () is equal to the hw team id, but instead need to use some
team specific memory (it is .shared for PTX), or if none is
provided, array indexed by the hw team id and run some teams serially within
the same hw thread.

2021-11-15  Jakub Jelinek  <jakub@redhat.com>

	* config/nvptx/team.c (__gomp_team_num): Define as
	__attribute__((shared)) var.
	(gomp_nvptx_main): Initialize __gomp_team_num to 0.
	* config/nvptx/target.c (__gomp_team_num): Declare as
	extern __attribute__((shared)) var.
	(GOMP_teams4): Use __gomp_team_num as the team number instead of
	%ctaid.x.  If first, initialize it to %ctaid.x.  If num_teams_lower
	is bigger than num_blocks, use num_teams_lower teams and arrange for
	bumping of __gomp_team_num if !first and returning false once we run
	out of teams.
	* config/nvptx/teams.c (__gomp_team_num): Declare as
	extern __attribute__((shared)) var.
	(omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x.
2021-11-15 09:20:52 +01:00
Jakub Jelinek d294459720 libgomp: Add a testcase for omp_get_num_teams inside of target inside of host teams
This is https://github.com/OpenMP/spec/issues/3183
There is an agreement that we should return 1 team inside of target,
even if that target is inside of host teams.  We were doing that
when offloading and not during host fallback, r12-5151 should fix that
even for host fallback.

2021-11-15  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c/teams-5.c: New test.
2021-11-15 08:58:39 +01:00
GCC Administrator af2852b9dc Daily bump. 2021-11-13 00:16:39 +00:00
Jakub Jelinek f49c7a4fb2 libgomp: Unbreak gcn offload build
My recent libgomp change apparently broke libgomp build for gcn offloading.
The problem is that gcn, unlike nvptx, doesn't override teams.c source file
and the patch I've committed assumed all the non-LIBGOMP_USE_PTHREADS targets
do not use it.  My understanding is that gcn included omp_get_num_teams
and omp_get_team_num definitions in both icv-device.o and teams.o,
with the definitions only in the former working correctly.

This patch brings gcn into sync with how nvptx does it, that teams.c
is overridden, provides a dummy GOMP_teams_reg and omp_get_{num_teams,team_num}
definitions and icv-device.c doesn't provide those.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

	PR target/103201
	* config/gcn/icv-device.c (omp_get_num_teams, omp_get_team_num): Move
	to ...
	* config/gcn/teams.c: ... here.  New file.
2021-11-12 16:11:02 +01:00
Chung-Lin Tang b7e2048063 openmp: Relax handling of implicit map vs. existing device mappings
This patch implements relaxing the requirements when a map with the implicit
attribute encounters an overlapping existing map. As the OpenMP 5.0 spec
describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22):

"If a single contiguous part of the original storage of a list item with an
 implicit data-mapping attribute has corresponding storage in the device data
 environment prior to a task encountering the construct that is associated with
 the map clause, only that part of the original storage will have corresponding
 storage in the device data environment as a result of the map clause."

2021-11-12  Chung-Lin Tang  <cltang@codesourcery.com>

include/ChangeLog:

	* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro.
	(GOMP_MAP_IMPLICIT): New special map kind bits value.
	(GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of
	special map kind bits.
	(GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds.

gcc/ChangeLog:

	* tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for
	'implicit' bit, using 'base.deprecated_flag' field of tree_node.
	* tree-pretty-print.c (dump_omp_clause): Add support for printing
	implicit attribute in tree dumping.
	* gimplify.c (gimplify_adjust_omp_clauses_1):
	Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly
	created.
	(gimplify_adjust_omp_clauses): Adjust place of adding implicitly created
	clauses, from simple append, to starting of list, after non-map clauses.
	* omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind
	values passed to libgomp for implicit maps.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-implicit-map-1.c: New test.
	* c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern.
	* c-c++-common/goacc/firstprivate-mappings-1.c: Likewise.
	* c-c++-common/goacc/mdc-1.c: Likewise.
	* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.

libgomp/ChangeLog:

	* target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add
	implicit map handling to allow a "superset" existing map as valid case.
	(get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value.
	(get_implicit): New function to extract implicit status.
	(gomp_map_fields_existing): Adjust arguments in calls to
	gomp_map_vars_existing, and add uses of get_implicit.
	(gomp_map_vars_internal): Likewise.
	* testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
2021-11-12 20:29:48 +08:00
Jakub Jelinek 7d6da11fce openmp: Honor OpenMP 5.1 num_teams lower bound
The following patch implements what I've been talking about earlier,
honor that for explicit num_teams clause we create at least the
lower-bound (if not specified, upper-bound) teams in the league.
For host fallback, it still means we only have one thread doing all the
teams, sequentially one after another.
For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too
will or might fail.
For these offloads, I think it is ok to remove symbols no longer used
from libgomp.a.
If num_teams_lower is bigger than the provided num_blocks or num_workgroups,
we should arrange for gomp_num_teams_var to be num_teams_lower - 1,
stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num ()
and instead use for it some .shared var that GOMP_teams4 initializes to
%ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first
increment that by num_blocks or num_workgroups each time and only
return false when we are above num_teams_lower.
Any help with actually implementing this for the 2 architectures highly
appreciated.

2021-11-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove.
	(BUILT_IN_GOMP_TEAMS4): New.
	* builtin-types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
	* omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of
	GOMP_teams, pass to it also num_teams lower-bound expression
	or a dup of upper-bound if it is missing and a flag whether
	it is the first call or not.
gcc/fortran/
	* types.def (BT_FN_VOID_UINT_UINT): Remove.
	(BT_FN_BOOL_UINT_UINT_UINT_BOOL): New.
libgomp/
	* libgomp_g.h (GOMP_teams4): Declare.
	* libgomp.map (GOMP_5.1): Export GOMP_teams4.
	* target.c (GOMP_teams4): New function.
	* config/nvptx/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* config/gcn/target.c (GOMP_teams): Remove.
	(GOMP_teams4): New function.
	* testsuite/libgomp.c/teams-4.c (main): Expect exactly 2
	teams instead of <= 2.
	* testsuite/libgomp.c-c++-common/teams-2.c: New test.
2021-11-12 12:41:22 +01:00
GCC Administrator b39265d4fe Daily bump. 2021-11-12 00:16:32 +00:00
Tobias Burnus 407eaad25f Fortran/openmp: Add support for 2 argument num_teams clause
Fortran part to commit r12-5146-g48d7327f2aaf65

gcc/fortran/ChangeLog:

	* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
	num_teams_upper, add num_teams_upper.
	* dump-parse-tree.c (show_omp_clauses): Update to handle
	lower-bound num_teams clause.
	* frontend-passes.c (gfc_code_walker): Likewise
	* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
	resolve_omp_clauses): Likewise.
	* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
	gfc_trans_omp_target): Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/teams-1.f90: New test.
2021-11-11 17:27:00 +01:00
Jakub Jelinek fa4fcb111a libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values
When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it.  Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4).  With target fallback being invoked from parallel
regions global vars simply can't work right on the host.

So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads.  For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
	* team.c (struct gomp_thread_start_data): Likewise.
	(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
	(gomp_team_start): Initialize start_data->num_teams and
	start_data->team_num.  Update nthr->num_teams and nthr->team_num.
	* teams.c (gomp_num_teams, gomp_team_num): Remove.
	(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
	instead of gomp_num_teams and gomp_team_num.
	(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
	(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
	* testsuite/libgomp.c/teams-4.c: New test.
2021-11-11 13:57:31 +01:00
Jakub Jelinek 48d7327f2a openmp: Add support for 2 argument num_teams clause
In OpenMP 5.1, num_teams clause can accept either one expression as before,
but it in that case changed meaning, rather than create <= expression
teams it is now create == expression teams.  Or it accepts two expressions
separated by :, with the meaning that the first is low bound and second upper
bound on how many teams should be created.  The other ways to set number of
teams are upper bounds with lower bound of 1.

The following patch does parsing of this for C/C++.  For host teams, we
actually don't need to do anything further right now, we always create
(pretend to create) exactly the requested number of teams, so we can just
evaluate and throw away the lower bound for now.
For teams nested in target, we don't guarantee that though and further
work will be needed.
In particular, omplower now turns the teams part of:
struct S { S (); S (const S &); ~S (); int s; };
void bar (S &, S &);
int baz ();
_Pragma ("omp declare target to (baz)");

void
foo (void)
{
  S a, b;
  #pragma omp target private (a) map (b)
  {
    #pragma omp teams firstprivate (b) num_teams (baz ())
    {
      bar (a, b);
    }
  }
}
into:
  retval.0 = baz ();
  retval.1 = retval.0;
  {
    unsigned int retval.3;
    struct S * D.2549;
    struct S b;

    retval.3 = (unsigned int) retval.1;
    D.2549 = .omp_data_i->b;
    S::S (&b, D.2549);
    #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a)
    __builtin_GOMP_teams (retval.3, 0);
    {
      bar (&a, &b);
    }
    S::~S (&b);
    #pragma omp return(nowait)
  }
IMHO we want a new API, say GOMP_teams3 which will take 3 arguments
instead of 2 (the lower and upper bounds from num_teams and thread_limit)
and will return a bool whether it should do the teams body or not.
And, we should add right before outermost {} above
while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0))
and remove the __builtin_GOMP_teams call.  The current function performs
exit equivalent (at least on NVPTX) which seems bad because that means
the destructors of e.g. private variables on target aren't invoked, and
at the current placement neither destructors of the already constructed
privatized variables in teams.
I'll do this next on the compiler side, but I'm afraid I'll need help
with the nvptx and amdgcn implementations.  E.g. for nvptx, we won't be
able to use %ctaid.x .  I think ideal would be to use a .shared
integer variable for the omp_get_team_num value, but I don't have any
experience with that, are .shared variables zero initialized by default,
or do they have random value at start?  PTX docs say they aren't initializable.

2021-11-11  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ...
	(OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this.
	(OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define.
	* tree.c (omp_clause_num_ops): Increase num ops for
	OMP_CLAUSE_NUM_TEAMS to 2.
	* tree-pretty-print.c (dump_omp_clause): Print optional lower bound
	for OMP_CLAUSE_NUM_TEAMS.
	* gimplify.c (gimplify_scan_omp_clauses): Gimplify
	OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL.
	(optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead
	of OMP_CLAUSE_NUM_TEAMS_EXPR.  Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	* omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR
	instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
	* omp-expand.c (expand_teams_call, get_target_arguments): Likewise.
gcc/c/
	* c-parser.c (c_parser_omp_clause_num_teams): Parse optional
	lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
	OMP_CLAUSE_NUM_TEAMS_EXPR.
	(c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
gcc/cp/
	* parser.c (cp_parser_omp_clause_num_teams): Parse optional
	lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
	Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
	OMP_CLAUSE_NUM_TEAMS_EXPR.
	(cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
	* semantics.c (finish_omp_clauses): Handle
	OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause.
	* pt.c (tsubst_omp_clauses): Likewise.
	(tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before
	combined target teams even lower-bound expression.
gcc/fortran/
	* trans-openmp.c (gfc_trans_omp_clauses): Use
	OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
gcc/testsuite/
	* c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression
	to half of the num_teams clauses.
	* c-c++-common/gomp/num-teams-1.c: New test.
	* c-c++-common/gomp/num-teams-2.c: New test.
	* g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression
	to half of the num_teams clauses.
	* g++.dg/gomp/attrs-2.C (bar): Likewise.
	* g++.dg/gomp/num-teams-1.C: New test.
	* g++.dg/gomp/num-teams-2.C: New test.
libgomp/
	* testsuite/libgomp.c-c++-common/teams-1.c: New test.
2021-11-11 09:42:47 +01:00
GCC Administrator c9b1334eec Daily bump. 2021-11-10 00:16:28 +00:00
Thomas Schwinge 00c9ce13a6 Restore 'GOMP_OPENACC_DIM' environment variable parsing
... that got broken by recent commit c057ed9c52
"openmp: Fix up strtoul and strtoull uses in libgomp", resulting in spurious
FAILs for tests specifying 'dg-set-target-env-var "GOMP_OPENACC_DIM" "[...]"'.

	libgomp/
	* env.c (parse_gomp_openacc_dim): Restore parsing.
2021-11-09 16:51:57 +01:00
GCC Administrator 0ef944629a Daily bump. 2021-10-31 00:16:24 +00:00
Tobias Burnus 948d461954 OpenMP: Add strictly nested API call check [PR102972]
The teams construct only permits omp_get_num_teams and omp_get_team_num
as API call in strictly nested regions - check for it.

Additionally, for Fortran, using DECL_NAME does not show the mangled
name, hence, DECL_ASSEMBLER_NAME had to be used to.

Finally, 'target device(ancestor:1)' wrongly rejected non-API calls
as well.

	PR middle-end/102972
gcc/ChangeLog:

	* omp-low.c (omp_runtime_api_call): Use DECL_ASSEMBLER_NAME to get
	internal Fortran name; new permit_num_teams arg to permit
	omp_get_num_teams and omp_get_team_num.
	(scan_omp_1_stmt): Update call to it, add missing call for
	reverse offload, and check for strictly nested API calls in teams.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-device-ancestor-3.c: Add non-API
	routine test.
	* gfortran.dg/gomp/order-6.f90: Add missing bind(C).
	* c-c++-common/gomp/teams-3.c: New test.
	* gfortran.dg/gomp/teams-3.f90: New test.
	* gfortran.dg/gomp/teams-4.f90: New test.

libgomp/ChangeLog:
	* testsuite/libgomp.c-c++-common/icv-3.c: Nest API calls inside
	parallel construct.
	* testsuite/libgomp.c-c++-common/icv-4.c: Likewise.
	* testsuite/libgomp.c/target-3.c: Likewise.
	* testsuite/libgomp.c/target-5.c: Likewise.
	* testsuite/libgomp.c/target-6.c: Likewise.
	* testsuite/libgomp.c/target-teams-1.c: Likewise.
	* testsuite/libgomp.c/teams-1.c: Likewise.
	* testsuite/libgomp.c/thread-limit-2.c: Likewise.
	* testsuite/libgomp.c/thread-limit-3.c: Likewise.
	* testsuite/libgomp.c/thread-limit-4.c: Likewise.
	* testsuite/libgomp.c/thread-limit-5.c: Likewise.
	* testsuite/libgomp.fortran/icv-3.f90: Likewise.
	* testsuite/libgomp.fortran/icv-4.f90: Likewise.
	* testsuite/libgomp.fortran/teams1.f90: Likewise.
2021-10-30 23:45:32 +02:00
GCC Administrator 4c61300f2b Daily bump. 2021-10-30 00:16:25 +00:00
Aldy Hernandez 4b3a325f07 Remove VRP threader passes in exchange for better threading pre-VRP.
This patch upgrades the pre-VRP threading passes to fully resolving
backward threaders, and removes the post-VRP threading passes altogether.
With it, we reduce the number of threaders in our pipeline from 9 to 7.

This will leave DOM as the only forward threader client.  When the ranger
can handle floats, we should be able to upgrade the pre-DOM threaders to
fully resolving threaders and kill the embedded DOM threader.

The numbers are as follows:

	prev: # threads in backward + vrp-threaders = 92624
	now:  # threads in backward threaders = 94275
	Gain: +1.78%

	prev: # total threads: 189495
	now:  # total threads: 193714
	Gain: +2.22%

	The numbers are not as great as my initial proposal, but I've
	recently pushed all the work that got us to this point ;-).

And... the compilation improves by 1.32%!

There's a regression on uninit-pred-7_a.c that I've yet to look at.  I
want to make sure it's not a missing thread.  If it is, I'll create a PR
and own it.

Also, the tree-ssa/phi_on_compare-*.c tests have all regressed.  This
seems to be some special case the forward threader handles that the
backward threader does not (edge_forwards_cmp_to_conditional_jump*).
I haven't dug deep to see if this is solveable within our
infrastructure, but a cursory look shows that even though the VRP
threader threads this, the *.optimized dump ends with more conditional
jumps than without the optimization.  I'd like to punt on this for
now, because DOM actually catches this through its lone use of the
forward threader (I've adjusted the tests).  However, we will need to
address this sooner or later, if indeed it's still improving the final
assembly.

gcc/ChangeLog:

	* passes.def: Replace the pass_thread_jumps before VRP* with
	pass_thread_jumps_full.  Remove all pass_vrp_threader instances.
	* tree-ssa-threadbackward.c (pass_data_thread_jumps_full):
	Remove hyphen from "thread-full" name.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading changes.
	* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

	* gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
	* gcc.dg/old-style-asm-1.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
	* gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
	* gcc.dg/tree-ssa/pr20701.c: Same.
	* gcc.dg/tree-ssa/pr21001.c: Same.
	* gcc.dg/tree-ssa/pr21294.c: Same.
	* gcc.dg/tree-ssa/pr21417.c: Same.
	* gcc.dg/tree-ssa/pr21559.c: Same.
	* gcc.dg/tree-ssa/pr21563.c: Same.
	* gcc.dg/tree-ssa/pr49039.c: Same.
	* gcc.dg/tree-ssa/pr59597.c: Same.
	* gcc.dg/tree-ssa/pr61839_1.c: Same.
	* gcc.dg/tree-ssa/pr61839_3.c: Same.
	* gcc.dg/tree-ssa/pr66752-3.c: Same.
	* gcc.dg/tree-ssa/pr68198.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/pr77445.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-1.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-2.c: Same.
	* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
	* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
	* gcc.dg/tree-ssa/vrp02.c: Same.
	* gcc.dg/tree-ssa/vrp03.c: Same.
	* gcc.dg/tree-ssa/vrp05.c: Same.
	* gcc.dg/tree-ssa/vrp06.c: Same.
	* gcc.dg/tree-ssa/vrp07.c: Same.
	* gcc.dg/tree-ssa/vrp08.c: Same.
	* gcc.dg/tree-ssa/vrp09.c: Same.
	* gcc.dg/tree-ssa/vrp33.c: Same.
	* gcc.dg/uninit-pred-9_b.c: Same.
	* gcc.dg/uninit-pred-7_a.c: xfail.
2021-10-29 17:57:27 +02:00
GCC Administrator 04a2cf3fd6 Daily bump. 2021-10-28 00:16:39 +00:00
Jakub Jelinek eef8114906 openmp: Document that non-rect loops are not supported in Fortran yet
I've found we claim to support non-rectangular loops, but don't actually
support those in Fortran, as can be seen on:
  integer i, j
  !$omp parallel do collapse(2)
  do i = 0, 10
    do j = 0, i
    end do
  end do
end
To support this, the Fortran FE needs to allow the valid forms of
non-rectangular loops and disallow others, so mainly it needs its
updated version of c-omp.c c_omp_check_loop_iv etc., plus for non-rectangular
lb or ub expressions emit a TREE_VEC instead of normal expression as the C/C++ FE
do, plus testsuite coverage.

2021-10-27  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (OpenMP 5.0): Mention that Non-rectangular loop nests
	aren't implemented for Fortran yet.
2021-10-27 09:24:46 +02:00
Jakub Jelinek 2084b5f42a openmp: Allow non-rectangular loops with pointer iterators
This patch handles pointer iterators for non-rectangular loops.  They are
more limited than integral iterators of non-rectangular loops, in particular
only var-outer, var-outer + a2, a2 + var-outer or var-outer - a2 can appear
in lb or ub where a2 is some integral loop invariant expression, so no e.g.
multiplication etc.

2021-10-27  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-expand.c (expand_omp_for_init_counts): Handle non-rectangular
	iterators with pointer types.
	(expand_omp_for_init_vars, extract_omp_for_update_vars): Likewise.
gcc/c-family/
	* c-omp.c (c_omp_check_loop_iv_r): Don't clear 3rd bit for
	POINTER_PLUS_EXPR.
	(c_omp_check_nonrect_loop_iv): Handle POINTER_PLUS_EXPR.
	(c_omp_check_loop_iv): Set kind even if the iterator is non-integral.
gcc/testsuite/
	* c-c++-common/gomp/loop-8.c: New test.
	* c-c++-common/gomp/loop-9.c: New test.
libgomp/
	* testsuite/libgomp.c/loop-26.c: New test.
	* testsuite/libgomp.c/loop-27.c: New test.
2021-10-27 09:22:07 +02:00
GCC Administrator b621508d6f Daily bump. 2021-10-26 00:16:26 +00:00
Tobias Burnus 72dc270be7 libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca
Some systems do not have <alloca.h> but provide alloca differently, e.g.
via stdlib.h. Do it like other testcases do and use __builtin_alloca.

libgomp/ChangeLog:

	PR testsuite/102910
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca
	instead of #include <alloca.h> + alloca.
2021-10-25 20:48:38 +02:00
GCC Administrator ae5c540662 Daily bump. 2021-10-22 00:16:31 +00:00
Chung-Lin Tang 2e4659199e openmp: Fortran strictly-structured blocks support
This implements strictly-structured blocks support for Fortran, as specified in
OpenMP 5.2. This now allows using a Fortran BLOCK construct as the body of most
OpenMP constructs, with a "!$omp end ..." ending directive optional for that
form.

gcc/fortran/ChangeLog:

	* decl.c (gfc_match_end): Add COMP_OMP_STRICTLY_STRUCTURED_BLOCK case
	together with COMP_BLOCK.
	* parse.c (parse_omp_structured_block): Change return type to
	'gfc_statement', add handling for strictly-structured block case, adjust
	recursive calls to parse_omp_structured_block.
	(parse_executable): Adjust calls to parse_omp_structured_block.
	* parse.h (enum gfc_compile_state): Add
	COMP_OMP_STRICTLY_STRUCTURED_BLOCK.
	* trans-openmp.c (gfc_trans_omp_workshare): Add EXEC_BLOCK case
	handling.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/cancel-1.f90: Adjust testcase.
	* gfortran.dg/gomp/nesting-3.f90: Adjust testcase.
	* gfortran.dg/gomp/strictly-structured-block-1.f90: New test.
	* gfortran.dg/gomp/strictly-structured-block-2.f90: New test.
	* gfortran.dg/gomp/strictly-structured-block-3.f90: New test.

libgomp/ChangeLog:

	* libgomp.texi (Support of strictly structured blocks in Fortran):
	Adjust to 'Y'.
	* testsuite/libgomp.fortran/task-reduction-16.f90: Adjust testcase.
2021-10-21 14:57:25 +08:00
GCC Administrator 674dda6be0 Daily bump. 2021-10-21 00:16:29 +00:00
Chung-Lin Tang d98626bf45 openmp: in_reduction support for Fortran
This patch implements support for the in_reduction clause for Fortran.
It also includes more completion of the taskgroup construct inside the
Fortran front-end, thus allowing task_reduction to work for task and
target constructs.

gcc/fortran/ChangeLog:

	* openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default
	false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case.
	(gfc_match_omp_clauses): Add 'openmp_target' default false parameter,
	adjust call to gfc_match_omp_clause_reduction.
	(match_omp): Adjust call to gfc_match_omp_clauses
	* trans-openmp.c (gfc_trans_omp_taskgroup): Add call to
	gfc_match_omp_clause, create and return block.

gcc/ChangeLog:

	* omp-low.c (omp_copy_decl_2): For !ctx, use record_vars to add new copy
	as local variable.
	(scan_sharing_clauses): Place copy of OMP_CLAUSE_IN_REDUCTION decl in
	ctx->outer instead of ctx.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan
	pattern.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-in-reduction-1.f90: New test.
	* testsuite/libgomp.fortran/target-in-reduction-2.f90: New test.
2021-10-20 23:25:02 +08:00
Jakub Jelinek c7abdf46fb openmp: Fix up struct gomp_work_share handling [PR102838]
If GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is not defined, the intent was to
treat the split of the structure between first cacheline (64 bytes)
as mostly write-once, use afterwards and second cacheline as rw just
as an optimization.  But as has been reported, with vectorization enabled
at -O2 it can now result in aligned vector 16-byte or larger stores.
When not having posix_memalign/aligned_alloc/memalign or other similar API,
alloc.c emulates it but it needs to allocate extra memory for the dynamic
realignment.
So, for the GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC not defined case, this patch
stops using aligned (64) attribute in the middle of the structure and instead
inserts padding that puts the second half of the structure at offset 64 bytes.

And when GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, usually it was allocated
as aligned, but for the orphaned case it could still be allocated just with
gomp_malloc without guaranteed proper alignment.

2021-10-20  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/102838
	* libgomp.h (struct gomp_work_share_1st_cacheline): New type.
	(struct gomp_work_share): Only use aligned(64) attribute if
	GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, otherwise just
	add padding before lock to ensure lock is at offset 64 bytes
	into the structure.
	(gomp_workshare_struct_check1, gomp_workshare_struct_check2):
	New poor man's static assertions.
	* work.c (gomp_work_share_start): Use gomp_aligned_alloc instead of
	gomp_malloc if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC.
2021-10-20 09:34:51 +02:00
Aldy Hernandez d8edfadfc7 Disallow loop rotation and loop header crossing in jump threaders.
There is a lot of fall-out from this patch, as there were many threading
tests that assumed the restrictions introduced by this patch were valid.
Some tests have merely shifted the threading to after loop
optimizations, but others ended up with no threading opportunities at
all.  Surprisingly some tests ended up with more total threads.  It was
a crapshoot all around.

On a postive note, there are 6 tests that no longer XFAIL, and one
guality test which now passes.

I felt a bit queasy about such a fundamental change wrt threading, so I
ran it through my callgrind test harness (.ii files from a bootstrap).
There was no change in overall compilation, DOM, or the VRP threaders.

However, there was a slight increase of 1.63% in the backward threader.
I'm pretty sure we could reduce this if we incorporated the restrictions
into their profitability code.  This way we could stop the search when
we ran into one of these restrictions.  Not sure it's worth it at this
point.

Tested on x86-64 Linux.

Co-authored-by: Richard Biener <rguenther@suse.de>

gcc/ChangeLog:

	* tree-ssa-threadupdate.c (cancel_thread): Dump threading reason
	on the same line as the threading cancellation.
	(jt_path_registry::cancel_invalid_paths): Avoid rotating loops.
	Avoid threading through loop headers where the path remains in the
	loop.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-5.c: Remove xfail.

gcc/testsuite/ChangeLog:

	* gcc.dg/Warray-bounds-87.c: Remove xfail.
	* gcc.dg/analyzer/pr94851-2.c: Remove xfail.
	* gcc.dg/graphite/pr69728.c: Remove xfail.
	* gcc.dg/graphite/scop-dsyr2k.c: Remove xfail.
	* gcc.dg/graphite/scop-dsyrk.c: Remove xfail.
	* gcc.dg/shrink-wrap-loop.c: Remove xfail.
	* gcc.dg/loop-8.c: Adjust for new threading restrictions.
	* gcc.dg/tree-ssa/ifc-20040816-1.c: Same.
	* gcc.dg/tree-ssa/pr21559.c: Same.
	* gcc.dg/tree-ssa/pr59597.c: Same.
	* gcc.dg/tree-ssa/pr71437.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/vect/bb-slp-16.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Remove.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Remove.
	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Remove.
	* gcc.dg/tree-ssa/ssa-thread-invalid.c: New test.
2021-10-20 07:07:35 +02:00
GCC Administrator ce4d1f632f Daily bump. 2021-10-19 00:16:23 +00:00
Jakub Jelinek 3adcf7e104 openmp: Fix handling of numa_domains(1)
If numa-domains is used with num-places count, sometimes the function
could create more places than requested and crash.  This depended on the
content of /sys/devices/system/node/online file, e.g. if the file
contains
0-1,16-17
and all NUMA nodes contain at least one CPU in the cpuset of the program,
then numa_domains(2) or numa_domains(4) (or 5+) work fine while
numa_domains(1) or numa_domains(3) misbehave.  I.e. the function was able
to stop after reaching limit on the , separators (or trivially at the end),
but not within in the ranges.

2021-10-18  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/affinity.c (gomp_affinity_init_numa_domains): Add
	&& gomp_places_list_len < count after nfirst <= nlast loop condition.
2021-10-18 15:00:46 +02:00
Tobias Burnus 64f9623765 Fortran: Fix Bind(C) Array-Descriptor Conversion
gfortran uses internally a different array descriptor ("gfc") as
Fortran 2018 alias TS291113 defines for C interoperability via
ISO_Fortran_binding.h ("CFI").  Hence, when calling a C function
from Fortran, it has to be converted in the callee - and if a
BIND(C) procedure is written in Fortran, the CFI argument has
to be converted to gfc in order work with the rest of the FE
code and the library calls.

Before this patch, part was handled in the FE generated code and
other parts in libgfortran.  With this patch, all code is generated
and CFI is defined as proper type - visible in the debugger and to
the middle end - avoiding both alias issues and missed optimization
issues.

This patch also fixes issues like: intent(out) deallocation in
the bind(C) callee, using the CFI descriptor also for allocatable
and pointer scalars and for len=* character strings.
For 'select rank', it also optimizes the code + avoid accessing
uninitialized memory if the dummy argument is allocatable/a pointer.
It additionally rejects passing a descriptorless type(*) to an
assumed-rank dummy argument. [F2018:C711]

	PR fortran/102086
	PR fortran/92189
	PR fortran/92621
	PR fortran/101308
	PR fortran/101309
	PR fortran/101635
	PR fortran/92482

gcc/fortran/ChangeLog:

	* decl.c (gfc_verify_c_interop_param): Remove 'sorry' for
	scalar allocatable/pointer and len=*.
	* expr.c (is_CFI_desc): Return true for for those.
	* gfortran.h (CFI_type_kind_shift, CFI_type_mask,
	CFI_type_from_type_kind, CFI_VERSION, CFI_MAX_RANK,
	CFI_attribute_pointer, CFI_attribute_allocatable,
	CFI_attribute_other, CFI_type_Integer, CFI_type_Logical,
	CFI_type_Real, CFI_type_Complex, CFI_type_Character,
	CFI_type_ucs4_char, CFI_type_struct, CFI_type_cptr,
	CFI_type_cfunptr, CFI_type_other): New #define.
	* trans-array.c (CFI_FIELD_BASE_ADDR, CFI_FIELD_ELEM_LEN,
	CFI_FIELD_VERSION, CFI_FIELD_RANK, CFI_FIELD_ATTRIBUTE,
	CFI_FIELD_TYPE, CFI_FIELD_DIM, CFI_DIM_FIELD_LOWER_BOUND,
	CFI_DIM_FIELD_EXTENT, CFI_DIM_FIELD_SM,
	gfc_get_cfi_descriptor_field, gfc_get_cfi_desc_base_addr,
	gfc_get_cfi_desc_elem_len, gfc_get_cfi_desc_version,
	gfc_get_cfi_desc_rank, gfc_get_cfi_desc_type,
	gfc_get_cfi_desc_attribute, gfc_get_cfi_dim_item,
	gfc_get_cfi_dim_lbound, gfc_get_cfi_dim_extent, gfc_get_cfi_dim_sm):
	New define/functions to access the CFI array descriptor.
	(gfc_conv_descriptor_type): New function for the GFC descriptor.
	(gfc_get_array_span): Handle expr of CFI descriptors and
	assumed-type descriptors.
	(gfc_trans_array_bounds): Remove 'static'.
	(gfc_conv_expr_descriptor): For assumed type, use the dtype of
	the actual argument.
	(structure_alloc_comps): Remove ' ' inside tabs.
	* trans-array.h (gfc_trans_array_bounds, gfc_conv_descriptor_type,
	gfc_get_cfi_desc_base_addr, gfc_get_cfi_desc_elem_len,
	gfc_get_cfi_desc_version, gfc_get_cfi_desc_rank,
	gfc_get_cfi_desc_type, gfc_get_cfi_desc_attribute,
	gfc_get_cfi_dim_lbound, gfc_get_cfi_dim_extent, gfc_get_cfi_dim_sm):
	New prototypes.
	* trans-decl.c (gfor_fndecl_cfi_to_gfc, gfor_fndecl_gfc_to_cfi):
	Remove global vars.
	(gfc_build_builtin_function_decls): Remove their initialization.
	(gfc_get_symbol_decl, create_function_arglist,
	gfc_trans_deferred_vars): Update for CFI.
	(convert_CFI_desc): Remove and replace by ...
	(gfc_conv_cfi_to_gfc): ... this function
	(gfc_generate_function_code): Call it; create local GFC var for CFI.
	* trans-expr.c (gfc_maybe_dereference_var): Handle CFI.
	(gfc_conv_subref_array_arg): Handle the if-noncontigous-only copy in
	when the result should be a descriptor.
	(gfc_conv_gfc_desc_to_cfi_desc): Completely rewritten.
	(gfc_conv_procedure_call): CFI fixes.
	* trans-openmp.c (gfc_omp_is_optional_argument,
	gfc_omp_check_optional_argument): Handle optional
	CFI.
	* trans-stmt.c (gfc_trans_select_rank_cases): Cleanup, avoid invalid
	code for allocatable/pointer dummies, which cannot be assumed size.
	* trans-types.c (gfc_cfi_descriptor_base): New global var.
	(gfc_get_dtype_rank_type): Skip rank init for rank < 0.
	(gfc_sym_type): Handle CFI dummies.
	(gfc_get_function_type): Update call.
	(gfc_get_cfi_dim_type, gfc_get_cfi_type): New.
	* trans-types.h (gfc_sym_type): Update prototype.
	(gfc_get_cfi_type): New prototype.
	* trans.c (gfc_trans_runtime_check): Make conditions more consistent
	to avoid '<logical> AND_THEN <long int>' in conditions.
	* trans.h (gfor_fndecl_cfi_to_gfc, gfor_fndecl_gfc_to_cfi): Remove
	global-var declaration.

libgfortran/ChangeLog:

	* ISO_Fortran_binding.h (CFI_type_cfunptr): Make unique type again.
	* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc,
	gfc_desc_to_cfi_desc): Add comment that those are no longer called
	by new code.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/optional-bind-c.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/ISO_Fortran_binding_4.f90: Extend testcase.
	* gfortran.dg/PR100914.f90: Remove xfail.
	* gfortran.dg/PR100915.c: Expect CFI_type_cfunptr.
	* gfortran.dg/PR100915.f90: Handle CFI_type_cfunptr != CFI_type_cptr.
	* gfortran.dg/PR93963.f90: Extend select-rank tests.
	* gfortran.dg/bind-c-intent-out.f90: Change to dg-do run,
	update scan-dump.
	* gfortran.dg/bind_c_array_params_2.f90: Update/extend scan-dump.
	* gfortran.dg/bind_c_char_10.f90: Update scan-dump.
	* gfortran.dg/bind_c_char_8.f90: Remove dg-error "sorry".
	* gfortran.dg/c-interop/allocatable-dummy.f90: Remove xfail.
	* gfortran.dg/c-interop/c1255-1.f90: Likewise.
	* gfortran.dg/c-interop/c407c-1.f90: Update dg-error.
	* gfortran.dg/c-interop/cf-descriptor-5.f90: Remove xfail.
	* gfortran.dg/c-interop/cf-out-descriptor-3.f90: Likewise.
	* gfortran.dg/c-interop/cf-out-descriptor-4.f90: Likewise.
	* gfortran.dg/c-interop/cf-out-descriptor-5.f90: Likewise.
	* gfortran.dg/c-interop/contiguous-2.f90: Likewise.
	* gfortran.dg/c-interop/contiguous-3.f90: Likewise.
	* gfortran.dg/c-interop/deferred-character-1.f90: Likewise.
	* gfortran.dg/c-interop/deferred-character-2.f90: Likewise.
	* gfortran.dg/c-interop/fc-descriptor-3.f90: Likewise.
	* gfortran.dg/c-interop/fc-descriptor-5.f90: Likewise.
	* gfortran.dg/c-interop/fc-descriptor-6.f90: Likewise.
	* gfortran.dg/c-interop/fc-out-descriptor-3.f90: Likewise.
	* gfortran.dg/c-interop/fc-out-descriptor-4.f90: Likewise.
	* gfortran.dg/c-interop/fc-out-descriptor-5.f90: Likewise.
	* gfortran.dg/c-interop/fc-out-descriptor-6.f90: Likewise.
	* gfortran.dg/c-interop/ff-descriptor-5.f90: Likewise.
	* gfortran.dg/c-interop/ff-descriptor-6.f90: Likewise.
	* gfortran.dg/c-interop/fc-descriptor-7.f90: Remove xfail + extend.
	* gfortran.dg/c-interop/fc-descriptor-7-c.c: Update for changes.
	* gfortran.dg/c-interop/shape.f90: Add implicit none.
	* gfortran.dg/c-interop/typecodes-array-char-c.c: Add kind=4 char.
	* gfortran.dg/c-interop/typecodes-array-char.f90: Likewise.
	* gfortran.dg/c-interop/typecodes-array-float128.f90: Remove xfail.
	* gfortran.dg/c-interop/typecodes-scalar-basic.f90: Likewise.
	* gfortran.dg/c-interop/typecodes-scalar-float128.f90: Likewise.
	* gfortran.dg/c-interop/typecodes-scalar-int128.f90: Likewise.
	* gfortran.dg/c-interop/typecodes-scalar-longdouble.f90: Likewise.
	* gfortran.dg/iso_c_binding_char_1.f90: Remove dg-error "sorry".
	* gfortran.dg/pr93792.f90: Turn XFAIL into PASS.
	* gfortran.dg/ISO_Fortran_binding_19.f90: New test.
	* gfortran.dg/assumed_type_12.f90: New test.
	* gfortran.dg/assumed_type_13.c: New test.
	* gfortran.dg/assumed_type_13.f90: New test.
	* gfortran.dg/bind-c-char-descr.f90: New test.
	* gfortran.dg/bind-c-contiguous-1.c: New test.
	* gfortran.dg/bind-c-contiguous-1.f90: New test.
	* gfortran.dg/bind-c-contiguous-2.f90: New test.
	* gfortran.dg/bind-c-contiguous-3.c: New test.
	* gfortran.dg/bind-c-contiguous-3.f90: New test.
	* gfortran.dg/bind-c-contiguous-4.c: New test.
	* gfortran.dg/bind-c-contiguous-4.f90: New test.
	* gfortran.dg/bind-c-contiguous-5.c: New test.
	* gfortran.dg/bind-c-contiguous-5.f90: New test.
2021-10-18 10:29:30 +02:00
GCC Administrator 93d183a5ff Daily bump. 2021-10-16 00:16:27 +00:00
Jakub Jelinek a10794eafb openmp: Improve testsuite/libgomp.c/affinity-1.c testcase
I've noticed that while I have added hopefully sufficient test coverage
for the case where one uses simple number or !number as p-interval,
I haven't added any coverage for number:len:stride or number:len.

This patch adds that.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c/affinity-1.c (struct places): Change name field
	type from char [50] to const char *.
	(places_array): Add a testcase for simplified syntax place followed
	by length or length and stride.
2021-10-15 17:19:54 +02:00
Jakub Jelinek 4a0fed0c0c openmp: Handle OpenMP 5.1 simplified OMP_PLACES syntax
In addition to adding ll_caches and numa_domain abstract names
to OMP_PLACES syntax, OpenMP 5.1 also added one syntax simplification:
https://github.com/OpenMP/spec/issues/2080
https://github.com/OpenMP/spec/pull/2081
in particular that in the grammar place non-terminal is now
not only { res-list } but also res (i.e. a non-negative integer),
which stands as a shortcut for { res }
So, one can specify OMP_PLACES=0,4,8,12 with the meaning
OMP_PLACES={0},{4},{8},{12} or OMP_PLACES=0:4 instead of OMP_PLACES={0}:4
or OMP_PLACES={0},{1},{2},{3} etc.

This patch implements that.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_one_place): Handle non-negative-number the same
	as { non-negative-number }.  Reject even !number:1 and
	!number:1:stride or !place:1 or !place:1:stride instead of just
	length other than 1.
	* libgomp.texi (OpenMP 5.1): Document OMP_PLACES syntax extensions
	and OMP_NUM_TEAMS/OMP_TEAMS_THREAD_LIMIT and
	omp_{set_num,get_max}_teams/omp_{s,g}et_teams_thread_limit features
	as implemented.
	* testsuite/libgomp.c/affinity-1.c: Add a test for the 5.1 place
	simplified syntax.
2021-10-15 16:35:57 +02:00
Jakub Jelinek c057ed9c52 openmp: Fix up strtoul and strtoull uses in libgomp
Yesterday when working on numa_domains, I've noticed because of a bug
in my patch a hang on a large NUMA machine.  I've fixed the bug, but
also discovered that the hang was a result of making wrong assumptions
about strtoul/strtoull.  All the uses were for portability setting
errno = 0 before the calls and treating non-zero errno after the call
as invalid input, but for the case where there are no valid digits at
all strtoul may set errno to EINVAL, but doesn't have to and with
glibc doesn't do that.  So, this patch goes through all the strtoul calls
and next to errno != 0 checks adds also endptr == startptr check.
Haven't done it in places where we immediately reject strtoul returning 0
the same as we reject errno != 0, because strtoul must return 0 in the
case where it sets endptr to the start pointer.  In some spots the code
was using errno = 0; x = strtoul (p, &p, 10); if (errno) { /*invalid*/ }
and those spots had to be changed to
errno = 0; x = strtoul (p, &end, 10); if (errno || end == p) { /*invalid*/ }
p = end;

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_schedule): For strtoul or strtoull calls which don't
	clearly reject return value 0 as invalid handle the case where end
	pointer is the same as first argument as invalid.
	(parse_unsigned_long_1): Likewise.
	(parse_one_place): Likewise.
	(parse_places_var): Likewise.
	(parse_stacksize): Likewise.
	(parse_spincount): Likewise.
	(parse_affinity): Likewise.
	(parse_gomp_openacc_dim): Likewise.  Avoid strict aliasing violation.
	Make code valid C89.
	* config/linux/affinity.c (gomp_affinity_find_last_cache_level):
	For strtoul calls which don't clearly reject return value 0 as
	invalid handle the case where end pointer is the same as first
	argument as invalid.
	(gomp_affinity_init_level_1): Likewise.
	(gomp_affinity_init_numa_domains): Likewise.
	* config/rtems/proc.c (parse_thread_pools): Likewise.
2021-10-15 16:28:34 +02:00
Jakub Jelinek 4764049dd6 openmp: Fix up handling of OMP_PLACES=threads(1)
When writing the places-*.c tests, I've noticed that we mishandle threads
abstract name with specified num-places if num-places isn't a multiple of
number of hw threads in a core.  It then happily ignores the maximum count
and overwrites for the remaining hw threads in a core further places that
haven't been allocated.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* config/linux/affinity.c (gomp_affinity_init_level_1): For level 1
	after creating count places clean up and return immediately.
	* testsuite/libgomp.c/places-6.c: New test.
	* testsuite/libgomp.c/places-7.c: New test.
	* testsuite/libgomp.c/places-8.c: New test.
	* testsuite/libgomp.c/places-9.c: New test.
	* testsuite/libgomp.c/places-10.c: New test.
2021-10-15 16:25:25 +02:00
Jakub Jelinek e7ce32c783 openmp: Add support for OMP_PLACES=numa_domains
This adds support for numa_domains abstract name in OMP_PLACES, also new
in OpenMP 5.1.

Way to test this is
OMP_PLACES=numa_domains OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true
and see what it prints on OMP_PLACES line.
For non-NUMA machines it should print a single place that covers all CPUs,
for NUMA machine one place for each NUMA node with corresponding CPUs.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_places_var): Handle numa_domains as level 5.
	* config/linux/affinity.c (gomp_affinity_init_numa_domains): New
	function.
	(gomp_affinity_init_level): Use it instead of
	gomp_affinity_init_level_1 for level == 5.
	* testsuite/libgomp.c/places-5.c: New test.
2021-10-15 12:16:50 +02:00
Jakub Jelinek 5809be05a2 openmp: Add support for OMP_PLACES=ll_caches
This patch implements support for ll_caches abstract name in OMP_PLACES,
which stands for places where logical cpus in each place share the last
level cache.

This seems to work fine for me on x86 and kernel sources show that it is
in common code, but on some machines on CompileFarm the files I'm using,
i.e.
/sys/devices/system/cpu/cpuN/cache/indexN/level
/sys/devices/system/cpu/cpuN/cache/indexN/shared_cpu_list
don't exist, is that because they have too old kernel and newer kernels
are fine or should I implement some fallback methods (which)?
E.g. on gcc112.fsffrance.org I see just shared_cpu_map and not shared_cpu_list
(with shared_cpu_map being harder to parse) and on another box I didn't even
see the cache subdirectories.

Way to test this is
OMP_PLACES=ll_caches OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 /bin/true
and see what it prints on OMP_PLACES line.

2021-10-15  Jakub Jelinek  <jakub@redhat.com>

	* env.c (parse_places_var): Handle ll_caches as level 4.
	* config/linux/affinity.c (gomp_affinity_find_last_cache_level): New
	function.
	(gomp_affinity_init_level_1): Handle level 4 as logical cpus sharing
	last level cache.
	(gomp_affinity_init_level): Likewise.
	* testsuite/libgomp.c/places-1.c: New test.
	* testsuite/libgomp.c/places-2.c: New test.
	* testsuite/libgomp.c/places-3.c: New test.
	* testsuite/libgomp.c/places-4.c: New test.
2021-10-15 12:06:51 +02:00
GCC Administrator 5d5885c99c Daily bump. 2021-10-15 00:17:02 +00:00
Kwok Cheung Yeung 2c4666fb06 openmp: Mark declare variant directive in documentation as supported in Fortran
2021-10-14  Kwok Cheung Yeung  <kcy@codesourcery.com>

libgomp/
	* libgomp.texi (OpenMP 5.0): Update entry for declare variant
	directive.
2021-10-14 09:35:33 -07:00
Kwok Cheung Yeung 724ee5a009 openmp, fortran: Add support for OpenMP declare variant directive in Fortran
2021-10-14  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/c-family/

	* c-omp.c (c_omp_check_context_selector): Rename to
	omp_check_context_selector and move to omp-general.c.
	(c_omp_mark_declare_variant): Rename to omp_mark_declare_variant and
	move to omp-general.c.

gcc/c/

	* c-parser.c (c_finish_omp_declare_variant): Change call from
	c_omp_check_context_selector to omp_check_context_selector. Change
	call from c_omp_mark_declare_variant to omp_mark_declare_variant.

gcc/cp/

	* decl.c (omp_declare_variant_finalize_one): Change call from
	c_omp_mark_declare_variant to omp_mark_declare_variant.
	* parser.c (cp_finish_omp_declare_variant): Change call from
	c_omp_check_context_selector to omp_check_context_selector.

gcc/fortran/

	* gfortran.h (enum gfc_statement): Add ST_OMP_DECLARE_VARIANT.
	(enum gfc_omp_trait_property_kind): New.
	(struct gfc_omp_trait_property): New.
	(gfc_get_omp_trait_property): New macro.
	(struct gfc_omp_selector): New.
	(gfc_get_omp_selector): New macro.
	(struct gfc_omp_set_selector): New.
	(gfc_get_omp_set_selector): New macro.
	(struct gfc_omp_declare_variant): New.
	(gfc_get_omp_declare_variant): New macro.
	(struct gfc_namespace): Add omp_declare_variant field.
	(gfc_free_omp_declare_variant_list): New prototype.
	* match.h (gfc_match_omp_declare_variant): New prototype.
	* openmp.c (gfc_free_omp_trait_property_list): New.
	(gfc_free_omp_selector_list): New.
	(gfc_free_omp_set_selector_list): New.
	(gfc_free_omp_declare_variant_list): New.
	(gfc_match_omp_clauses): Add extra optional argument.  Handle end of
	clauses for context selectors.
	(omp_construct_selectors, omp_device_selectors,
	omp_implementation_selectors, omp_user_selectors): New.
	(gfc_match_omp_context_selector): New.
	(gfc_match_omp_context_selector_specification): New.
	(gfc_match_omp_declare_variant): New.
	* parse.c: Include tree-core.h and omp-general.h.
	(decode_omp_directive): Handle 'declare variant'.
	(case_omp_decl): Include ST_OMP_DECLARE_VARIANT.
	(gfc_ascii_statement): Handle ST_OMP_DECLARE_VARIANT.
	(gfc_parse_file): Initialize omp_requires_mask.
	* symbol.c (gfc_free_namespace): Call
	gfc_free_omp_declare_variant_list.
	* trans-decl.c (gfc_get_extern_function_decl): Call
	gfc_trans_omp_declare_variant.
	(gfc_create_function_decl): Call gfc_trans_omp_declare_variant.
	* trans-openmp.c (gfc_trans_omp_declare_variant): New.
	* trans-stmt.h (gfc_trans_omp_declare_variant): New prototype.

gcc/

	* omp-general.c (omp_check_context_selector):  Move from c-omp.c.
	(omp_mark_declare_variant): Move from c-omp.c.
	(omp_context_name_list_prop): Update for Fortran strings.
	* omp-general.h (omp_check_context_selector): New prototype.
	(omp_mark_declare_variant): New prototype.

gcc/testsuite/

	* gfortran.dg/gomp/declare-variant-1.f90: New test.
	* gfortran.dg/gomp/declare-variant-10.f90: New test.
	* gfortran.dg/gomp/declare-variant-11.f90: New test.
	* gfortran.dg/gomp/declare-variant-12.f90: New test.
	* gfortran.dg/gomp/declare-variant-13.f90: New test.
	* gfortran.dg/gomp/declare-variant-14.f90: New test.
	* gfortran.dg/gomp/declare-variant-15.f90: New test.
	* gfortran.dg/gomp/declare-variant-16.f90: New test.
	* gfortran.dg/gomp/declare-variant-17.f90: New test.
	* gfortran.dg/gomp/declare-variant-18.f90: New test.
	* gfortran.dg/gomp/declare-variant-19.f90: New test.
	* gfortran.dg/gomp/declare-variant-2.f90: New test.
	* gfortran.dg/gomp/declare-variant-2a.f90: New test.
	* gfortran.dg/gomp/declare-variant-3.f90: New test.
	* gfortran.dg/gomp/declare-variant-4.f90: New test.
	* gfortran.dg/gomp/declare-variant-5.f90: New test.
	* gfortran.dg/gomp/declare-variant-6.f90: New test.
	* gfortran.dg/gomp/declare-variant-7.f90: New test.
	* gfortran.dg/gomp/declare-variant-8.f90: New test.
	* gfortran.dg/gomp/declare-variant-9.f90: New test.

libgomp/

	* testsuite/libgomp.fortran/declare-variant-1.f90: New test.
2021-10-14 09:16:36 -07:00
GCC Administrator 52055987fb Daily bump. 2021-10-13 00:16:22 +00:00
Julian Brown ccfcf08e66 libgomp: Release device lock on cbuf error path
This patch releases the device lock on a sanity-checking error path in
transfer combining (cbuf) handling in libgomp:target.c.  This shouldn't
happen when handling well-formed mapping clauses, but erroneous clauses
can currently cause a hang if the condition triggers.

2021-12-10  Julian Brown  <julian@codesourcery.com>

libgomp/
	* target.c (gomp_copy_host2dev): Release device lock on cbuf
	error path.
2021-10-12 06:50:26 -07:00
Tobias Burnus f5a538e164 Fortran version of libgomp.c-c++-common/icv-{3,4}.c
This adds the Fortran testsuite coverage of
omp_{get_max,set_num}_threads and omp_{s,g}et_teams_thread_limit

libgomp/
	* testsuite/libgomp.fortran/icv-3.f90: New.
	* testsuite/libgomp.fortran/icv-4.f90: New.
2021-10-12 10:54:18 +02:00
Jakub Jelinek 4096bf82a0 openmp: Add documentation for omp_{get_max, set_num}_threads and omp_{s, g}et_teams_thread_limit
This patch adds documentation for these new OpenMP 5.1 APIs as well as
two new environment variables - OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT.

2021-10-12  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (omp_get_max_teams, omp_get_teams_thread_limit,
	omp_set_num_teams, omp_set_teams_thread_limit, OMP_NUM_TEAMS,
	OMP_TEAMS_THREAD_LIMIT): Document.
2021-10-12 09:35:43 +02:00
Jakub Jelinek de7fa7063e openmp: Fix up warnings on libgomp.info build
When building libgomp documentation, I see
makeinfo --split-size=5000000  -I ../../../libgomp/../gcc/doc/include -I ../../../libgomp -o libgomp.info ../../../libgomp/libgomp.texi
../../../libgomp/libgomp.texi:503: warning: node next `omp_get_default_device' in menu `omp_get_device_num' and in sectioning `omp_get_dynamic' differ
../../../libgomp/libgomp.texi:528: warning: node prev `omp_get_dynamic' in menu `omp_get_device_num' and in sectioning `omp_get_default_device' differ
../../../libgomp/libgomp.texi:560: warning: node next `omp_get_initial_device' in menu `omp_get_level' and in sectioning `omp_get_device_num' differ
../../../libgomp/libgomp.texi:587: warning: node next `omp_get_device_num' in menu `omp_get_dynamic' and in sectioning `omp_get_level' differ
../../../libgomp/libgomp.texi:587: warning: node prev `omp_get_device_num' in menu `omp_get_default_device' and in sectioning `omp_get_initial_device' differ
../../../libgomp/libgomp.texi:615: warning: node prev `omp_get_level' in menu `omp_get_initial_device' and in sectioning `omp_get_device_num' differ
warnings.  This patch fixes those.

2021-10-12  Jakub Jelinek  <jakub@redhat.com>

	* libgomp.texi (omp_get_device_num): Move @node before omp_get_dynamic
	to avoid makeinfo warnings.
2021-10-12 09:34:38 +02:00
Jakub Jelinek 88f5ad524a openmp: Add testsuite coverage for omp_{get_max,set_num}_threads and omp_{s,g}et_teams_thread_limit
This adds (C/C++ only) testsuite coverage for these new OpenMP 5.1 APIs.

2021-10-12  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c-c++-common/icv-3.c: New test.
	* testsuite/libgomp.c-c++-common/icv-4.c: New test.
2021-10-12 09:32:28 +02:00
Jakub Jelinek 342aedf0e5 libgomp: alloc* test fixes [PR102628, PR102668]
As reported, the alloc-9.c test and alloc-{1,2,3}.F* and alloc-11.f90
tests fail on powerpc64-linux with -m32.
The reason why it fails just there is that malloc doesn't guarantee there
128-bit alignment (historically glibc guaranteed 2 * sizeof (void *)
alignment from malloc).

There are two separate issues.
One is a thinko on my side.
In this part of alloc-9.c test (copied to alloc-11.f90), we have
2 allocators, a with pool size 1024B and alignment 16B and default fallback
and a2 with pool size 512B and alignment 32B and a as fallback allocator.
We start at no allocations in both at line 194 and do:
  p = (int *) omp_alloc (sizeof (int), a2);
// This succeeds in a2 and needs 4+overhead bytes (which includes the 32B alignment)
  p = (int *) omp_realloc (p, 420, a, a2);
// This allocates 420 bytes+overhead in a, with 16B alignment and deallocates the above
  q = (int *) omp_alloc (sizeof (int), a);
// This allocates 4+overhead bytes in a, with 16B alignment
  q = (int *) omp_realloc (q, 420, a2, a);
// This allocates 420+overhead in a2 with 32B alignment
  q = (int *) omp_realloc (q, 768, a2, a2);
// This attempts to reallocate, but as there are elevated alignment
// requirements doesn't try to just realloc (even if it wanted to try that
// a2 is almost full, with 512-420-overhead bytes left in it), so it
// tries to alloc in a2, but there is no space left in the pool, falls
// back to a, which already has 420+overhead bytes allocated in it and
// 1024-420-overhead bytes left and so fails too and fails to default
// non-pool allocator that allocates it, but doesn't guarantee alignment
// higher than malloc guarantees.
// But, the test expected 16B alignment.

So, I've slightly lowered the allocation sizes in that part of the test
420->320 and 768 -> 568, so that the last test still fails to allocate
in a2 (568 > 512-320-overhead) but succeeds in a as fallback, which was
the intent of the test.

Another thing is that alloc-1.F90 seems to be transcription of
libgomp.c-c++-common/alloc-1.c into Fortran, but alloc-1.c had:
  q = (int *) omp_alloc (768, a2);
  if ((((uintptr_t) q) % 16) != 0)
    abort ();
  q[0] = 7;
  q[767 / sizeof (int)] = 8;
  r = (int *) omp_alloc (512, a2);
  if ((((uintptr_t) r) % __alignof (int)) != 0)
    abort ();
there but Fortran has:
        cq = omp_alloc (768_c_size_t, a2)
        if (mod (transfer (cq, intptr), 16_c_intptr_t) /= 0) stop 12
        call c_f_pointer (cq, q, [768 / c_sizeof (i)])
        q(1) = 7
        q(768 / c_sizeof (i)) = 8
        cr = omp_alloc (512_c_size_t, a2)
        if (mod (transfer (cr, intptr), 16_c_intptr_t) /= 0) stop 13
I'm changing the latter to 4_c_intptr_t because other spots in the
testcase do that, Fortran sadly doesn't have c_alignof, but strictly
speaking it isn't correct, __alignof (int) could be on some architectures
smaller than 4.
So probably alloc-1.F90 etc. should also have
! { dg-additional-sources alloc-7.c }
! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is valid for Fortran but not for C" }
and use get__alignof_int.

2021-10-12  Jakub Jelinek  <jakub@redhat.com>

	PR libgomp/102628
	PR libgomp/102668
	* testsuite/libgomp.c-c++-common/alloc-9.c (main): Decrease
	allocation sizes from 420 to 320 and from 768 to 568.
	* testsuite/libgomp.fortran/alloc-11.f90: Likewise.
	* testsuite/libgomp.fortran/alloc-1.F90: Change expected alignment
	for cr from 16 to 4.
2021-10-12 09:30:41 +02:00
Jakub Jelinek fab2f61dc1 vectorizer: Fix up -fsimd-cost-model= handling
>	* testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap.

I don't think this is the right thing to do.
This just means that at some point between 2013 when -fsimd-cost-model has
been introduced and now -fsimd-cost-model= option at least partially stopped
working properly.
As documented, -fsimd-cost-model= overrides the -fvect-cost-model= setting
for OpenMP simd loops (loop->force_vectorize is true) if specified differently
from default.
In tree-vectorizer.h we have:
static inline bool
unlimited_cost_model (loop_p loop)
{
  if (loop != NULL && loop->force_vectorize
      && flag_simd_cost_model != VECT_COST_MODEL_DEFAULT)
    return flag_simd_cost_model == VECT_COST_MODEL_UNLIMITED;
  return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED);
}
and use it in various places, but we also just use flag_vect_cost_model
in lots of places (and in one spot use flag_simd_cost_model, not sure if
we are sure it is a force_vectorize loop or what).

So, IMHO we should change the above inline function to
loop_cost_model and let it return the cost model and then just
reimplement unlimited_cost_model as
return loop_cost_model (loop) == VECT_COST_MODEL_UNLIMITED;
and then adjust the direct uses of the flag and revert these changes.

2021-10-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-vectorizer.h (loop_cost_model): New function.
	(unlimited_cost_model): Use it.
	* tree-vect-loop.c (vect_analyze_loop_costing): Use loop_cost_model
	call instead of flag_vect_cost_model.
	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
	(vect_prune_runtime_alias_test_list): Likewise.  Also use it instead
	of flag_simd_cost_model.
gcc/testsuite/
	* gcc.dg/gomp/simd-2.c: Remove option -fvect-cost-model=cheap.
	* gcc.dg/gomp/simd-3.c: Likewise.
libgomp/
	* testsuite/libgomp.c/scan-11.c: Remove option -fvect-cost-model=cheap.
	* testsuite/libgomp.c/scan-12.c: Likewise.
	* testsuite/libgomp.c/scan-13.c: Likewise.
	* testsuite/libgomp.c/scan-14.c: Likewise.
	* testsuite/libgomp.c/scan-15.c: Likewise.
	* testsuite/libgomp.c/scan-16.c: Likewise.
	* testsuite/libgomp.c/scan-17.c: Likewise.
	* testsuite/libgomp.c/scan-18.c: Likewise.
	* testsuite/libgomp.c/scan-19.c: Likewise.
	* testsuite/libgomp.c/scan-20.c: Likewise.
	* testsuite/libgomp.c/scan-21.c: Likewise.
	* testsuite/libgomp.c/scan-22.c: Likewise.
	* testsuite/libgomp.c++/scan-9.C: Likewise.
	* testsuite/libgomp.c++/scan-10.C: Likewise.
	* testsuite/libgomp.c++/scan-11.C: Likewise.
	* testsuite/libgomp.c++/scan-12.C: Likewise.
	* testsuite/libgomp.c++/scan-13.C: Likewise.
	* testsuite/libgomp.c++/scan-14.C: Likewise.
	* testsuite/libgomp.c++/scan-15.C: Likewise.
	* testsuite/libgomp.c++/scan-16.C: Likewise.
2021-10-12 09:28:10 +02:00
liuhongt d61ce6ab04 Adjust testcase for O2 vectorization enabling
This issue was observed in rs6000 specific PR102658 as well.

I've looked into it a bit, it's caused by the "conditional store replacement" which
is originally disabled without vectorization as below code.

  /* If either vectorization or if-conversion is disabled then do
     not sink any stores.  */
  if (param_max_stores_to_sink == 0
      || (!flag_tree_loop_vectorize && !flag_tree_slp_vectorize)
      || !flag_tree_loop_if_convert)
    return false;

The new change makes the innermost loop look like

for (int c1 = 0; c1 <= 1499; c1 += 1) {
  if (c1 <= 500) {
     S_10(c0, c1);
  } else {
      S_9(c0, c1);
  }
  S_11(c0, c1);
}

and can not be splitted as:

for (int c1 = 0; c1 <= 500; c1 += 1)
  S_10(c0, c1);

for (int c1 = 501; c1 <= 1499; c1 += 1)
  S_9(c0, c1);

So instead of disabling vectorization, could we just disable this cs replacement
with parameter "--param max-stores-to-sink=0"?

I tested this proposal on ppc64le, it should work as well.

2021-10-11  Kewen Lin  <linkw@linux.ibm.com>

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-8.c: Add --param max-stores-to-sink=0.
2021-10-12 15:24:12 +08:00
GCC Administrator 732d763847 Daily bump. 2021-10-12 00:17:02 +00:00
Marcel Vollweiler f70977936a libgomp: Add tests for omp_atv_serialized and deprecate omp_atv_sequential.
The variable omp_atv_sequential was replaced by omp_atv_serialized in OpenMP
5.1. This was already implemented by Jakub (C/C++, commit ea82325afe) and
Tobias (Fortran, commit fff15bad1a).

This patch adds two tests to check if omp_atv_serialized is available (one test
for C/C++ and one for Fortran). Besides that omp_atv_sequential is marked as
deprecated in C/C++ and Fortran for OpenMP 5.1.

libgomp/ChangeLog:

	* allocator.c (omp_init_allocator): Replace omp_atv_sequential with
	omp_atv_serialized.
	* omp.h.in: Add deprecated flag for omp_atv_sequential.
	* omp_lib.f90.in: Add deprecated flag for omp_atv_sequential.
	* testsuite/libgomp.c-c++-common/alloc-10.c: New test.
	* testsuite/libgomp.fortran/alloc-12.f90: New test.
2021-10-11 04:34:51 -07:00
Jakub Jelinek 07dd3bcda1 openmp: Add omp_set_num_teams, omp_get_max_teams, omp_[gs]et_teams_thread_limit
OpenMP 5.1 adds env vars and functions to set and query new ICVs used
as fallback if thread_limit or num_teams clauses aren't specified on
teams construct.

The following patch implements those, though further work will be needed:
1) OpenMP 5.1 also changed the num_teams clause, so that it can specify
   both lower and upper limit for how many teams should be created and
   changed the meaning when only one expression is provided, instead of
   num_teams(expr) in 5.0 meaning num_teams(1:expr) in 5.1, it now means
   num_teams(expr:expr), i.e. while previously we could create 1 to expr
   teams, in 5.1 we have some low limit by default equal to the single
   expression provided and may not create fewer teams.
   For host teams (which we don't currently implement efficiently for
   NUMA hosts) we trivially satisfy it now by always honoring what the
   user asked for, but for the offloading teams I think we'll need to
   rethink the APIs; currently teams construct is just a call that returns
   and possibly lowers the number of teams; and whenever possible we try
   to evaluate num_teams/thread_limit already on the target construct
   and the GOMP_teams call just sets the number of teams to the minimum
   of provided and requested teams; for some cases e.g. where target
   is not combined with teams and num_teams expression calls some functions
   etc., we need to call those functions in the target region and so it is
   late to figure number of teams, but also hw could just limit what it
   is willing to create; in that case I'm afraid we need to run the target
   body multiple times and arrange for omp_get_team_num () returning the
   right values
2) we need to finally implement the NUMA handling for GOMP_teams_reg
3) I now realize I haven't added some testcase coverage, will do that
   incrementally
4) libgomp.texi needs updates for these new APIs, but also others like
   the allocator

2021-10-11  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-low.c (omp_runtime_api_call): Handle omp_get_max_teams,
	omp_[sg]et_teams_thread_limit and omp_set_num_teams.
libgomp/
	* omp.h.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* omp_lib.f90.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* omp_lib.h.in (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare.
	* libgomp.h (gomp_nteams_var, gomp_teams_thread_limit_var): Declare.
	* libgomp.map (OMP_5.1): Export omp_get_max_teams{,_},
	omp_get_teams_thread_limit{,_}, omp_set_num_teams{,_,_8_} and
	omp_set_teams_thread_limit{,_,_8_}.
	* icv.c (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): New
	functions.
	* env.c (gomp_nteams_var, gomp_teams_thread_limit_var): Define.
	(omp_display_env): Print OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT.
	(initialize_env): Handle OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env
	vars.
	* teams.c (GOMP_teams_reg): If thread_limit is not specified, use
	gomp_teams_thread_limit_var as fallback if not zero.  If num_teams
	is not specified, use gomp_nteams_var.
	* fortran.c (omp_set_num_teams, omp_get_max_teams,
	omp_set_teams_thread_limit, omp_get_teams_thread_limit): Add
	ialias_redirect.
	(omp_set_num_teams_, omp_set_num_teams_8_, omp_get_max_teams_,
	omp_set_teams_thread_limit_, omp_set_teams_thread_limit_8_,
	omp_get_teams_thread_limit_): New functions.
2021-10-11 12:20:22 +02:00
GCC Administrator c9db17b880 Daily bump. 2021-10-10 00:16:19 +00:00
liuhongt b4e81f6dd4 Adjust more testcases for O2 vectorization enabling.
libgomp/ChangeLog:

	* testsuite/libgomp.c++/scan-10.C: Add option -fvect-cost-model=cheap.
	* testsuite/libgomp.c++/scan-11.C: Ditto.
	* testsuite/libgomp.c++/scan-12.C: Ditto.
	* testsuite/libgomp.c++/scan-13.C: Ditto.
	* testsuite/libgomp.c++/scan-14.C: Ditto.
	* testsuite/libgomp.c++/scan-15.C: Ditto.
	* testsuite/libgomp.c++/scan-16.C: Ditto.
	* testsuite/libgomp.c++/scan-9.C: Ditto.
	* testsuite/libgomp.c-c++-common/lastprivate-conditional-7.c: Ditto.
	* testsuite/libgomp.c-c++-common/lastprivate-conditional-8.c: Ditto.
	* testsuite/libgomp.c/scan-11.c: Ditto.
	* testsuite/libgomp.c/scan-12.c: Ditto.
	* testsuite/libgomp.c/scan-13.c: Ditto.
	* testsuite/libgomp.c/scan-14.c: Ditto.
	* testsuite/libgomp.c/scan-15.c: Ditto.
	* testsuite/libgomp.c/scan-16.c: Ditto.
	* testsuite/libgomp.c/scan-17.c: Ditto.
	* testsuite/libgomp.c/scan-18.c: Ditto.
	* testsuite/libgomp.c/scan-19.c: Ditto.
	* testsuite/libgomp.c/scan-20.c: Ditto.
	* testsuite/libgomp.c/scan-21.c: Ditto.
	* testsuite/libgomp.c/scan-22.c: Ditto.

gcc/testsuite/ChangeLog:

	* g++.dg/tree-ssa/pr94403.C: Add -fno-tree-vectorize
	* gcc.dg/optimize-bswapsi-5.c: Ditto.
	* gcc.dg/optimize-bswapsi-6.c: Ditto.
	* gcc.dg/Warray-bounds-51.c: Add additional option
	-mtune=generic for target x86/i?86
	* gcc.dg/Wstringop-overflow-14.c: Ditto.
2021-10-09 16:28:11 +08:00
Jakub Jelinek 875124eb08 openmp: Add support for OpenMP 5.1 structured-block-sequences
Related to this is the addition of structured-block-sequence in OpenMP 5.1,
which doesn't change anything for Fortran, but for C/C++ allows multiple
statements instead of just one possibly compound around the separating
directives (section and scan).

I've also made some updates to the OpenMP 5.1 support list in libgomp.texi.

2021-10-09  Jakub Jelinek  <jakub@redhat.com>

gcc/c/
	* c-parser.c (c_parser_omp_structured_block_sequence): New function.
	(c_parser_omp_scan_loop_body): Use it.
	(c_parser_omp_sections_scope): Likewise.
gcc/cp/
	* parser.c (cp_parser_omp_structured_block): Remove disallow_omp_attrs
	argument.
	(cp_parser_omp_structured_block_sequence): New function.
	(cp_parser_omp_scan_loop_body): Use it.
	(cp_parser_omp_sections_scope): Likewise.
gcc/testsuite/
	* c-c++-common/gomp/sections1.c (foo): Don't expect errors on
	multiple statements in between section directive(s).  Add testcases
	for invalid no statements in between section directive(s).
	* gcc.dg/gomp/sections-2.c (foo): Don't expect errors on
	multiple statements in between section directive(s).
	* g++.dg/gomp/sections-2.C (foo): Likewise.
	* g++.dg/gomp/attrs-6.C (foo): Add testcases for multiple
	statements in between section directive(s).
	(bar): Add testcases for multiple statements in between scan
	directive.
	* g++.dg/gomp/attrs-7.C (bar): Adjust expected error recovery.
libgomp/
	* libgomp.texi (OpenMP 5.1): Mention implemented support for
	structured block sequences in C/C++.  Mention support for
	unconstrained/reproducible modifiers on order clause.
	Mention partial (C/C++ only) support of extentensions to atomics
	construct.  Mention partial (C/C++ on clause only) support of
	align/allocator modifiers on allocate clause.
2021-10-09 10:14:36 +02:00
GCC Administrator e3e07b8955 Daily bump. 2021-10-03 00:16:17 +00:00
Tobias Burnus 703d8a4d39 Add libgomp.fortran/order-reproducible-*.f90
libgomp/ChangeLog:

	* testsuite/libgomp.fortran/order-reproducible-1.f90: New test
	based on libgomp.c-c++-common/order-reproducible-1.c.
	* testsuite/libgomp.fortran/order-reproducible-2.f90: Likewise.
	* testsuite/libgomp.fortran/my-usleep.c: New test.
2021-10-02 11:29:35 +02:00
GCC Administrator 9d116bcc55 Daily bump. 2021-10-02 00:16:31 +00:00
Tobias Burnus 2a93d18da3 Add/update libgomp.fortran/alloc-*.f90
libgomp/ChangeLog:

	* testsuite/libgomp.fortran/alloc-10.f90: Fix alignment check.
	* testsuite/libgomp.fortran/alloc-7.f90: Fix array access.
	* testsuite/libgomp.fortran/alloc-8.f90: Likewise.
	* testsuite/libgomp.fortran/alloc-11.f90: New test for omp_realloc,
	based on libgomp.c-c++-common/alloc-9.c.
2021-10-01 20:03:25 +02:00
Jakub Jelinek e705b8533a openmp: Differentiate between order(concurrent) and order(reproducible:concurrent)
While OpenMP 5.1 implies order(concurrent) is the same thing as
order(reproducible:concurrent), this is going to change in OpenMP 5.2, where
essentially order(concurrent) means nothing is stated on whether it is
reproducible or unconstrained (and is determined by other means, e.g. for/do
with schedule static or runtime with static being selected is implicitly
reproducible, distribute with dist_schedule static is implicitly reproducible,
loop is implicitly reproducible) and when the modifier is specified explicitly,
it overrides the implicit behavior either way.
And, when order(reproducible:concurrent) is used with e.g. schedule(dynamic)
or some other schedule that is by definition not reproducible, it is
implementation's duty to ensure it is reproducible, either by remembering how
it scheduled some loop and then replaying the same schedule when seeing loops
with the same directive/schedule/number of iterations, or by overriding the
schedule to some reproducible one.

This patch doesn't implement the 5.2 wording just yet, but in the FEs
differentiates between the 3 states - no explicit modifier, explicit reproducible
or explicit unconstrainted, so that the middle-end can easily switch any time.
Instead it follows the 5.1 wording where both order(concurrent) (implicit or
explicit) or order(reproducible:concurrent) imply reproducibility.
And, it implements the easier method, when for/do should be reproducible, it
just chooses static schedule.  order(concurrent) implies no OpenMP APIs in the
loop body nor threadprivate vars, so the exact scheduling isn't (easily at least)
observable.

2021-10-01  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree.h (OMP_CLAUSE_ORDER_REPRODUCIBLE): Define.
	* tree-pretty-print.c (dump_omp_clause) <case OMP_CLAUSE_ORDER>: Print
	reproducible: for OMP_CLAUSE_ORDER_REPRODUCIBLE.
	* omp-general.c (omp_extract_for_data): If OMP_CLAUSE_ORDER is seen
	without OMP_CLAUSE_ORDER_UNCONSTRAINED, overwrite sched_kind to
	OMP_CLAUSE_SCHEDULE_STATIC.
gcc/c-family/
	* c-omp.c (c_omp_split_clauses): Also copy
	OMP_CLAUSE_ORDER_REPRODUCIBLE.
gcc/c/
	* c-parser.c (c_parser_omp_clause_order): Set
	OMP_CLAUSE_ORDER_REPRODUCIBLE for explicit reproducible: modifier.
gcc/cp/
	* parser.c (cp_parser_omp_clause_order): Set
	OMP_CLAUSE_ORDER_REPRODUCIBLE for explicit reproducible: modifier.
gcc/fortran/
	* gfortran.h (gfc_omp_clauses): Add order_reproducible bitfield.
	* dump-parse-tree.c (show_omp_clauses): Print REPRODUCIBLE: for it.
	* openmp.c (gfc_match_omp_clauses): Set order_reproducible for
	explicit reproducible: modifier.
	* trans-openmp.c (gfc_trans_omp_clauses): Set
	OMP_CLAUSE_ORDER_REPRODUCIBLE for order_reproducible.
	(gfc_split_omp_clauses): Also copy order_reproducible.
gcc/testsuite/
	* gfortran.dg/gomp/order-5.f90: Adjust scan-tree-dump-times regexps.
libgomp/
	* testsuite/libgomp.c-c++-common/order-reproducible-1.c: New test.
	* testsuite/libgomp.c-c++-common/order-reproducible-2.c: New test.
2021-10-01 10:45:48 +02:00
Jakub Jelinek 3749c3aff6 openmp: Avoid PLT relocations for omp_* symbols in libgomp
This patch avoids the following relocations:
readelf -Wr libgomp.so.1.0.0 | grep omp_
00000000000470e0  0000020700000007 R_X86_64_JUMP_SLOT     000000000001d9d0 omp_fulfill_event@@OMP_5.0.1 + 0
0000000000047170  000000b800000007 R_X86_64_JUMP_SLOT     000000000000e760 omp_display_env@@OMP_5.1 + 0
00000000000471e0  000000e800000007 R_X86_64_JUMP_SLOT     000000000000f910 omp_get_initial_device@@OMP_4.5 + 0
0000000000047280  0000019500000007 R_X86_64_JUMP_SLOT     0000000000015940 omp_get_active_level@@OMP_3.0 + 0
00000000000472c8  0000020d00000007 R_X86_64_JUMP_SLOT     0000000000035210 omp_get_team_num@@OMP_4.0 + 0
00000000000472f0  0000014700000007 R_X86_64_JUMP_SLOT     0000000000035200 omp_get_num_teams@@OMP_4.0 + 0
by using ialias{,_call,_redirect} macros as needed.

We still have many acc_* PLT relocations, could somebody please fix those?
readelf -Wr libgomp.so.1.0.0 | grep acc_
0000000000046fb8  000001ed00000006 R_X86_64_GLOB_DAT      0000000000036350 acc_prof_unregister@@OACC_2.5.1 + 0
0000000000046fd8  000000a400000006 R_X86_64_GLOB_DAT      0000000000035f30 acc_prof_register@@OACC_2.5.1 + 0
0000000000046fe0  000001d100000006 R_X86_64_GLOB_DAT      0000000000035ee0 acc_prof_lookup@@OACC_2.5.1 + 0
0000000000047058  000001dd00000007 R_X86_64_JUMP_SLOT     0000000000031f40 acc_create_async@@OACC_2.5 + 0
0000000000047068  0000011500000007 R_X86_64_JUMP_SLOT     000000000002fc60 acc_get_property@@OACC_2.6 + 0
0000000000047070  000001fb00000007 R_X86_64_JUMP_SLOT     0000000000032ce0 acc_wait_all@@OACC_2.0 + 0
0000000000047080  0000006500000007 R_X86_64_JUMP_SLOT     000000000002f990 acc_on_device@@OACC_2.0 + 0
0000000000047088  000000ae00000007 R_X86_64_JUMP_SLOT     0000000000032140 acc_attach_async@@OACC_2.6 + 0
0000000000047090  0000021900000007 R_X86_64_JUMP_SLOT     000000000002f550 acc_get_device_type@@OACC_2.0 + 0
0000000000047098  000001cb00000007 R_X86_64_JUMP_SLOT     0000000000032090 acc_copyout_finalize@@OACC_2.5 + 0
00000000000470a8  0000005200000007 R_X86_64_JUMP_SLOT     0000000000031f80 acc_copyin@@OACC_2.0 + 0
00000000000470b8  000001ad00000007 R_X86_64_JUMP_SLOT     0000000000032030 acc_delete_finalize@@OACC_2.5 + 0
00000000000470e8  0000010900000007 R_X86_64_JUMP_SLOT     0000000000031f00 acc_create@@OACC_2.0 + 0
00000000000470f8  0000005900000007 R_X86_64_JUMP_SLOT     0000000000032b70 acc_wait_async@@OACC_2.0 + 0
0000000000047110  0000013100000007 R_X86_64_JUMP_SLOT     0000000000032860 acc_async_test@@OACC_2.0 + 0
0000000000047118  000001ff00000007 R_X86_64_JUMP_SLOT     000000000002f720 acc_get_device_num@@OACC_2.0 + 0
0000000000047128  0000019100000007 R_X86_64_JUMP_SLOT     0000000000032020 acc_delete_async@@OACC_2.5 + 0
0000000000047130  000001d200000007 R_X86_64_JUMP_SLOT     000000000002efa0 acc_shutdown@@OACC_2.0 + 0
0000000000047150  000000d000000007 R_X86_64_JUMP_SLOT     0000000000031f00 acc_present_or_create@@OACC_2.0 + 0
0000000000047188  0000019200000007 R_X86_64_JUMP_SLOT     0000000000031910 acc_is_present@@OACC_2.0 + 0
0000000000047190  000001aa00000007 R_X86_64_JUMP_SLOT     000000000002fca0 acc_get_property_string@@OACC_2.6 + 0
00000000000471d0  000001bf00000007 R_X86_64_JUMP_SLOT     0000000000032120 acc_update_self_async@@OACC_2.5 + 0
0000000000047200  0000020500000007 R_X86_64_JUMP_SLOT     0000000000032e00 acc_wait_all_async@@OACC_2.0 + 0
0000000000047208  000000a600000007 R_X86_64_JUMP_SLOT     0000000000031790 acc_deviceptr@@OACC_2.0 + 0
0000000000047218  0000007500000007 R_X86_64_JUMP_SLOT     0000000000032000 acc_delete@@OACC_2.0 + 0
0000000000047238  000001e900000007 R_X86_64_JUMP_SLOT     000000000002f3a0 acc_set_device_type@@OACC_2.0 + 0
0000000000047240  000001f600000007 R_X86_64_JUMP_SLOT     000000000002ef20 acc_init@@OACC_2.0 + 0
0000000000047248  0000018800000007 R_X86_64_JUMP_SLOT     0000000000032060 acc_copyout@@OACC_2.0 + 0
0000000000047258  0000021f00000007 R_X86_64_JUMP_SLOT     0000000000032a80 acc_wait@@OACC_2.0 + 0
0000000000047270  000001bc00000007 R_X86_64_JUMP_SLOT     0000000000032100 acc_update_self@@OACC_2.0 + 0
0000000000047288  0000011400000007 R_X86_64_JUMP_SLOT     0000000000032080 acc_copyout_async@@OACC_2.5 + 0
0000000000047290  0000013d00000007 R_X86_64_JUMP_SLOT     000000000002f850 acc_set_device_num@@OACC_2.0 + 0
00000000000472a8  000000c500000007 R_X86_64_JUMP_SLOT     00000000000320e0 acc_update_device_async@@OACC_2.5 + 0
00000000000472c0  0000014600000007 R_X86_64_JUMP_SLOT     0000000000031fc0 acc_copyin_async@@OACC_2.5 + 0
00000000000472f8  0000006a00000007 R_X86_64_JUMP_SLOT     000000000002f310 acc_get_num_devices@@OACC_2.0 + 0
0000000000047350  0000021700000007 R_X86_64_JUMP_SLOT     0000000000031f80 acc_present_or_copyin@@OACC_2.0 + 0
0000000000047360  0000020900000007 R_X86_64_JUMP_SLOT     00000000000320c0 acc_update_device@@OACC_2.0 + 0
0000000000047380  0000008400000007 R_X86_64_JUMP_SLOT     0000000000032950 acc_async_test_all@@OACC_2.0 + 0

2021-10-01  Jakub Jelinek  <jakub@redhat.com>

	* affinity-fmt.c (omp_get_team_num, omp_get_num_teams): Add
	ialias_redirect.
	* env.c (handle_omp_display_env): Use ialias_call.
	* icv-device.c: Move ialias right below each function.
	(omp_get_device_num): Use ialias_call.
	* fortran.c (omp_fulfill_event): Add ialias_redirect.
	* icv.c (omp_get_active_level): Add ialias_redirect.
2021-10-01 10:42:07 +02:00
Jakub Jelinek 998e434f8f openmp: Add alloc_align attribute to omp_aligned_*alloc and testcase for omp_realloc
This patch adds alloc_align attribute to omp_aligned_{,c}alloc so that if
the first argument is constant, GCC can assume requested alignment.

Additionally, it adds testsuite coverage for omp_realloc which I haven't
managed to write in the patch from yesterday.

2021-10-01  Jakub Jelinek  <jakub@redhat.com>

	* omp.h.in (omp_aligned_alloc, omp_aligned_calloc): Add
	__alloc_align__ (1) attribute.
	* testsuite/libgomp.c-c++-common/alloc-9.c: New test.
2021-10-01 10:32:10 +02:00