Commit Graph

254 Commits

Author SHA1 Message Date
Tom de Vries
e0451f93d9 [nvptx] Add some support for .local atomics
The ptx insn atom doesn't support local memory.  In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.

The message is somewhat confusing given that actually the operation is not
supported on local address space.

Fix this by falling back on a non-atomic version when detecting
a frame-related memory operand.

This only solves some cases that are detected at compile-time.  It does
however fix the openacc private-atomic-* test-cases.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1")
	(define_insn "atomic_exchange<mode>")
	(define_insn "atomic_fetch_add<mode>")
	(define_insn "atomic_fetch_addsf")
	(define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version
	if memory operands is frame-relative.

gcc/testsuite/ChangeLog:

2022-01-31  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/stack-atomics-run.c: New test.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove
	PR83812 workaround.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
2022-02-01 19:28:24 +01:00
Tom de Vries
d43fbc7d3f [libgomp, testsuite] Fix insufficient resources in test-cases
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
  'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
  recompile the program with 'num_workers = x and vector_length = y' on \
  that offloaded region or '-fopenacc-dim=y' where x * y <= 896.

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  \
  -O0  execution test
...

The error does not occur when using GOMP_NVPTX_JIT=-O0.

Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia.

Likewise for some other test-cases.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce
	num_workers for nvidia accelerator to fix libgomp error 'insufficient
	resources'.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	Same.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
2022-02-01 08:15:00 +01:00
Thomas Schwinge
087e545747 Strengthen a few OpenACC test cases
Rather than rubber-stamp whatever requested vs. actual device kernel launch
configuration happens, actually (again) verify the requested values (modulo
expected variations).

This better highlights that "AMD GCN has an upper limit of 'num_workers(16)'",
and the deficiency that "AMD GCN uses the autovectorizer for the vector
dimension: the use of a function call in vector-partitioned code [...] is not
currently supported".

And, this removes several instances of race conditions, where variables are
concurrently written to in OpenACC gang-redundant mode.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Strengthen.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
2022-01-21 18:45:30 +01:00
Martin Liska
2aea19bdb1 nvptx: update fix for -Wformat-diag
gcc/ChangeLog:

	* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Update
	warning messages.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update scanning
	patterns.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-01-19 08:27:00 +01:00
Martin Liska
b1f3640912 nvptx: fix -Wformat-diag warnings
gcc/ChangeLog:

	* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Wrap
	keyword.
	* config/nvptx/nvptx.md: Remove trailing dot.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update keyword
	in dg-warning.
	* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
2022-01-18 17:25:36 +01:00
Thomas Schwinge
4bd8b1e881 Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases
... including "note: '[...]' was declared here" emitted since recent
commit 9695e1c23b
"Improve -Wuninitialized note location".

For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es,
including one more instance of PR77504 etc., and several instances where
for "local variables" of reference-data-type reductions (etc.?) we emit
bogus (?) diagnostics.

For implicit data clauses (including 'firstprivate'), we seem to be missing
diagnostics, so I've placed XFAILed 'dg-warning's.

	gcc/testsuite/
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document
	current '-Wuninitialized' diagnostics.
	* c-c++-common/goacc/mdc-1.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-routine.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise.
	* c-c++-common/goacc/uninit-if-clause.c: Likewise.
	* gfortran.dg/goacc/array-with-dt-1.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-2.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-1.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-3.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-4.f90: Likewise.
	* gfortran.dg/goacc/derived-classtypes-1.f95: Likewise.
	* gfortran.dg/goacc/derived-types-2.f90: Likewise.
	* gfortran.dg/goacc/host_data-tree.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/modules.f95: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/pr93464.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
	* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-if-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise.
	* gfortran.dg/goacc/wait.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document
	current '-Wuninitialized' diagnostics.
	* testsuite/libgomp.oacc-fortran/data-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reference-reductions.f90:
	Likewise.
2022-01-13 11:52:35 +01:00
Julian Brown
e52253bcc0 Wait at end of OpenACC asynchronous kernels regions
In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and
asynchronous data and compute regions, giving rise to data races when the
asynchronicity is actually executed, as is visible in at least on test case
with GCN offloading.

The proper fix is to correctly use the asynchronous interfaces, making the
currently synchronous data regions fully asynchronous (see also
<https://gcc.gnu.org/PR97390> "[OpenACC] 'async' clause on 'data' construct",
which is to share the same implementation), but that's for later; for now add
some more synchronization.

	gcc/
	* omp-oacc-kernels-decompose.cc (add_wait): New function, split out
	of...
	(add_async_clauses_and_wait): ...here. Call new outlined function.
	(decompose_kernels_region_body): Add wait at the end of
	explicitly-asynchronous kernels regions.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN
	offloading execution XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2022-01-13 10:42:17 +01:00
Thomas Schwinge
9b32c1669a OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]
... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:

    13073			else if (is_gimple_reg (var))
    13074			  {
    13075			    gcc_assert (offloaded);
    13076			    tree avar = create_tmp_var (TREE_TYPE (var));
    13077			    mark_addressable (avar);

..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
to make work, so let's just avoid this case.

	gcc/
	PR middle-end/100280
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Mark variables used in synthesized data clauses as addressable.
	gcc/testsuite/
	PR middle-end/100280
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
	* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
	Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
	'--param openacc-kernels=decompose'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Update.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
	Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
	'--param openacc-kernels=decompose'.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	libgomp/
	PR middle-end/100280
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.

Suggested-by: Julian Brown <julian@codesourcery.com>
2022-01-13 10:42:17 +01:00
Thomas Schwinge
862e5f398b Enhance OpenACC 'kernels' decomposition testing
gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: Enhance.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
2022-01-13 10:42:17 +01:00
Tobias Burnus
72dc270be7 libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca
Some systems do not have <alloca.h> but provide alloca differently, e.g.
via stdlib.h. Do it like other testcases do and use __builtin_alloca.

libgomp/ChangeLog:

	PR testsuite/102910
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca
	instead of #include <alloca.h> + alloca.
2021-10-25 20:48:38 +02:00
Julian Brown
2a3f9f6532 openacc: Shared memory layout optimisation
This patch implements an algorithm to lay out local data-share (LDS)
space.  It currently works for AMD GCN.  At the moment, LDS is used for
three things:

  1. Gang-private variables
  2. Reduction temporaries (accumulators)
  3. Broadcasting for worker partitioning

After the patch is applied, (2) and (3) are placed at preallocated
locations in LDS, and (1) continues to be handled by the backend (as it
is at present prior to this patch being applied). LDS now looks like this:

  +--------------+ (gang-private size + 1024, = 1536)
  | free space   |
  |    ...       |
  | - - - - - - -|
  | worker bcast |
  +--------------+
  | reductions   |
  +--------------+ <<< -mgang-private-size=<number> (def. 512)
  | gang-private |
  |    vars      |
  +--------------+ (32)
  | low LDS vars |
  +--------------+ LDS base

So, gang-private space is fixed at a constant amount at compile time
(which can be increased with a command-line switch if necessary
for some given code). The layout algorithm takes out a slice of the
remainder of usable space for reduction vars, and uses the rest for
worker partitioning.

The partitioning algorithm works as follows.

 1. An "adjacency" set is built up for each basic block that might
    do a broadcast. This is calculated by starting at each such block,
    and doing a recursive DFS walk over successors to find the next
    block (or blocks) that *also* does a broadcast
    (dfs_broadcast_reachable_1).

 2. The adjacency set is inverted to get adjacent predecessor blocks also.

 3. Blocks that will perform a broadcast are sorted by size of that
    broadcast: the biggest blocks are handled first.

 4. A splay tree structure is used to calculate the spans of LDS memory
    that are already allocated by the blocks adjacent to this one
    (merge_ranges{,_1}.

 5. The current block's broadcast space is allocated from the first free
    span not allocated in the splay tree structure calculated above
    (first_fit_range). This seems to work quite nicely and efficiently
    with the splay tree structure.

 6. Continue with the next-biggest broadcast block until we're done.

In this way, "adjacent" broadcasts will not use the same piece of
LDS memory.

PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in:

The GCN backend uses tree nodes like MEM((__lds TYPE *) <constant>)
for reduction temporaries. Unlike e.g. var decls and SSA names, these
nodes cannot be shared during gimplification, but are so in some
circumstances. This is detected when appropriate --enable-checking
options are used. This patch unshares such nodes when they are reused
more than once.

gcc/
	* config/gcn/gcn-protos.h
	(gcn_goacc_create_worker_broadcast_record): Update prototype.
	* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use
	preallocated block of LDS memory.  Do not cache/share decls for
	reduction temporaries between invocations.
	(gcn_goacc_reduction_teardown): Unshare VAR on second use.
	(gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter
	and return temporary LDS space at that offset.  Return pointer in
	"sender" case.
	* config/gcn/gcn.c (acc_lds_size, gang_private_hwm, lds_allocs):
	New global vars.
	(ACC_LDS_SIZE): Define as acc_lds_size.
	(gcn_init_machine_status): Don't initialise lds_allocated,
	lds_allocs, reduc_decls fields of machine function struct.
	(gcn_option_override): Handle default size for gang-private
	variables and -mgang-private-size option.
	(gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when
	initialising M0_REG.
	(gcn_shared_mem_layout): New function.
	(gcn_print_lds_decl): Update comment. Use global lds_allocs map and
	gang_private_hwm variable.
	(TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook.
	* config/gcn/gcn.h (machine_function): Remove lds_allocated,
	lds_allocs, reduc_decls. Add reduction_base, reduction_limit.
	* config/gcn/gcn.opt (gang_private_size_opt): New global.
	(mgang-private-size=): New option.
	* doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place
	documentation hook.
	* doc/tm.texi: Regenerate.
	* omp-oacc-neuter-broadcast.cc (targhooks.h, diagnostic-core.h):
	Add includes.
	(build_sender_ref): Handle sender_decl being pointer.
	(worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS
	parameters.  Pass placement argument to
	create_worker_broadcast_record hook invocations.  Handle
	sender_decl being pointer and isolate_broadcasts inserting extra
	barriers.
	(blk_offset_map_t): Add typedef.
	(neuter_worker_single): Add BLK_OFFSET_MAP parameter.  Pass
	preallocated range to worker_single_copy call.
	(dfs_broadcast_reachable_1): New function.
	(idx_decl_pair_t, used_range_vec_t): New typedefs.
	(sort_size_descending): New function.
	(addr_range): New class.
	(splay_tree_compare_addr_range, splay_tree_free_key)
	(first_fit_range, merge_ranges_1, merge_ranges): New functions.
	(execute_omp_oacc_neuter_broadcast): Rename to...
	(oacc_do_neutering): ... this.  Add BOUNDS_LO, BOUNDS_HI
	parameters.  Arrange layout of shared memory for broadcast
	operations.
	(execute_omp_oacc_neuter_broadcast): New function.
	(pass_omp_oacc_neuter_broadcast::gate): Remove num_workers==1
	handling from here.  Enable pass for all OpenACC routines in order
	to call shared memory-layout hook.
	* target.def (create_worker_broadcast_record): Add OFFSET
	parameter.
	(shared_mem_layout): New hook.
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Update.
2021-09-17 21:04:30 +02:00
Julian Brown
8251f90e87 Add 'libgomp.oacc-c-c++-common/broadcast-many.c'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: New test.
2021-09-17 21:04:29 +02:00
Thomas Schwinge
a2ab2f0dfb Address '?:' issues in 'libgomp.oacc-c-c++-common/mode-transitions.c'
[...]/libgomp.oacc-c-c++-common/mode-transitions.c: In function ‘t3’:
    [...]/libgomp.oacc-c-c++-common/mode-transitions.c:127:43: warning: ‘?:’ using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
      127 |     assert (arr[i] == ((i % 64) < 32) ? 1 : -1);
          |                                           ^

    [...]/libgomp.oacc-c-c++-common/mode-transitions.c: In function ‘t9’:
    [...]/libgomp.oacc-c-c++-common/mode-transitions.c:359:46: warning: ‘?:’ using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
      359 |         assert (arr[i] == ((i % 3) == 0) ? 1 : 2);
          |                                              ^

..., and PR101862 "[C, C++] Potential '?:' diagnostic for always-true
expressions in boolean context".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: Address
	'?:' issues.
2021-08-16 12:12:09 +02:00
Thomas Schwinge
2cc65fcbd4 Adjust 'libgomp.oacc-c-c++-common/static-variable-1.c'
... for 'gcc/gimplify.c:gimplify_scan_omp_clauses' changes in recent
commit d0befed793 "openmp: Add support
for OpenMP 5.1 masked construct".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/static-variable-1.c: Adjust.
2021-08-13 22:53:58 +02:00
Julian Brown
c408512e1f amdgcn: Enable OpenACC worker partitioning for AMD GCN
gcc/
	* config/gcn/gcn.c (gcn_init_builtins): Override decls for
	BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START,
	BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER.
	(gcn_goacc_validate_dims): Turn on worker partitioning unconditionally.
	(gcn_fork_join): Update comment.
	* config/gcn/gcn.opt (flag_worker_partitioning): Remove.
	(macc_experimental_workers): Remove unused option.
	libgomp/
	* plugin/plugin-gcn.c (gcn_exec): Change default number of workers to
	16.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
	[acc_device_radeon]: Update.
	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c
	[ACC_DEVICE_TYPE_radeon]: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
	[acc_device_radeon]: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
	[ACC_DEVICE_TYPE_radeon]: Likewise.
	* testsuite/libgomp.oacc-fortran/optional-reduction.f90: XFAIL for
	'openacc_radeon_accel_selected' and '-O0'.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.

Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-08-09 15:08:44 +02:00
Thomas Schwinge
0829ab79d3 [OpenACC] Extract 'pass_oacc_loop_designation' out of 'pass_oacc_device_lower'
This really is a separate step -- and another pass to be added between the two,
later on.

	gcc/
	* omp-offload.c (oacc_loop_xform_head_tail, oacc_loop_process):
	'update_stmt' after modification.
	(pass_oacc_loop_designation): New function, extracted out of...
	(pass_oacc_device_lower): ... this.
	(pass_data_oacc_loop_designation, pass_oacc_loop_designation)
	(make_pass_oacc_loop_designation): New
	* passes.def: Add it.
	* tree-parloops.c (create_parallel_loop): Adjust.
	* tree-pass.h (make_pass_oacc_loop_designation): New.
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c:
	's%oaccdevlow%oaccloops%g'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/classify-parallel.c: Likewise.
	* c-c++-common/goacc/classify-routine-nohost.c: Likewise.
	* c-c++-common/goacc/classify-routine.c: Likewise.
	* c-c++-common/goacc/classify-serial.c: Likewise.
	* c-c++-common/goacc/routine-nohost-1.c: Likewise.
	* g++.dg/goacc/template.C: Likewise.
	* gcc.dg/goacc/loop-processing-1.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	* gfortran.dg/goacc/classify-parallel.f95: Likewise.
	* gfortran.dg/goacc/classify-routine-nohost.f95: Likewise.
	* gfortran.dg/goacc/classify-routine.f95: Likewise.
	* gfortran.dg/goacc/classify-serial.f95: Likewise.
	* gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c:
	's%oaccdevlow%oaccloops%g'.
	* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Likewise.

Co-Authored-By: Julian Brown <julian@codesourcery.com>
Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
2021-07-29 09:19:44 +02:00
Thomas Schwinge
d88a695158 Don't use libgomp 'cbuf' buffering with OpenACC 'async'
The host data might not be computed yet (by an earlier asynchronous compute
region, for example.

	libgomp/
	* target.c (gomp_coalesce_buf_add): Update comment.
	(gomp_copy_host2dev, gomp_map_vars_internal): Don't expect to see
	'aq && cbuf'.
	(gomp_map_vars_internal): Only 'if (!aq)', do
	'gomp_coalesce_buf_add'.
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c: Remove
	XFAIL.

Co-Authored-By: Julian Brown <julian@codesourcery.com>
2021-07-27 11:16:37 +02:00
Julian Brown
9c41f5b9cd Fix OpenACC "ephemeral" asynchronous host-to-device copies
This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.

An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html

and previous versions of this patch were posted here (for mainline/og9):

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html
  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html

libgomp/
	* libgomp.h (gomp_copy_host2dev): Update prototype.
	* oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new
	argument to gomp_copy_host2dev (false).
	* plugin/plugin-gcn.c (struct copy_data): Remove free_src field.
	(copy_data): Don't free src.
	(queue_push_copy): Remove free_src handling.
	(GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy.
	(GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data
	snapshotting.
	(GOMP_OFFLOAD_openacc_async_dev2host): Update call to
	queue_push_copy.
	* target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter.
	(gomp_copy_host2dev): Add EPHEMERAL parameter.  Snapshot source
	data when true, and set up deferred freeing of temporary buffer.
	(gomp_copy_dev2host): Update call to goacc_device_copy_async.
	(gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer)
	(gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update
	calls to gomp_copy_host2dev with appropriate ephemeral argument.
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove
	XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-07-27 11:16:27 +02:00
Thomas Schwinge
88c40c36db Add 'libgomp.oacc-c-c++-common/async-data-1-{1,2}.c'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: New file.
	* testsuite/libgomp.oacc-c-c++-common/async-data-1-2.c: Likewise.

Co-Authored-By: Tom de Vries <tom@codesourcery.com>
2021-07-27 11:16:26 +02:00
Thomas Schwinge
29ddaf43f7 [OpenACC] Clarify sequencing of 'async' data copying vs. profiling events in 'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c'
... as noticed with GCN offloading.

Fix-up for r271346 (commit 5fae049dc2)
"OpenACC Profiling Interface (incomplete)".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Clarify
	sequencing of 'async' data copying vs. profiling events.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
	Likewise.
2021-07-27 11:16:25 +02:00
Thomas Schwinge
599e275d7e Fix OpenACC 'async'/'wait' issues in 'libgomp.oacc-c-c++-common/lib-{94,95}.c', 'libgomp.oacc-fortran/lib-16{,-2}.f90'
Fix-up for r265842 (commit 58168bbf6f)
"[OpenACC 2.5, libgomp] Add *_async versions of runtime library API functions".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/lib-94.c: Fix OpenACC
	'async'/'wait' issue.
	* testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-16-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-16.f90: Likewise.

Co-Authored-By: Julian Brown <julian@codesourcery.com>
2021-07-27 11:16:24 +02:00
Thomas Schwinge
a61f6afbee OpenACC 'nohost' clause
Do not "compile a version of this procedure for the host".

	gcc/
	* tree-core.h (omp_clause_code): Add 'OMP_CLAUSE_NOHOST'.
	* tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1):
	Handle it.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* omp-general.c (oacc_verify_routine_clauses): Likewise.
	* gimplify.c (gimplify_scan_omp_clauses)
	(gimplify_adjust_omp_clauses): Likewise.
	* tree-nested.c (convert_nonlocal_omp_clauses)
	(convert_local_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	* omp-offload.c (execute_oacc_device_lower): Update.
	gcc/c-family/
	* c-pragma.h (pragma_omp_clause): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
	gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Handle 'nohost'.
	(c_parser_oacc_all_clauses): Handle 'PRAGMA_OACC_CLAUSE_NOHOST'.
	(OACC_ROUTINE_CLAUSE_MASK): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
	* c-typeck.c (c_finish_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
	gcc/cp/
	* parser.c (cp_parser_omp_clause_name): Handle 'nohost'.
	(cp_parser_oacc_all_clauses): Handle 'PRAGMA_OACC_CLAUSE_NOHOST'.
	(OACC_ROUTINE_CLAUSE_MASK): Add 'PRAGMA_OACC_CLAUSE_NOHOST'.
	* pt.c (tsubst_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
	* semantics.c (finish_omp_clauses): Likewise.
	gcc/fortran/
	* dump-parse-tree.c (show_attr): Update.
	* gfortran.h (symbol_attribute): Add 'oacc_routine_nohost' member.
	(gfc_omp_clauses): Add 'nohost' member.
	* module.c (ab_attribute): Add 'AB_OACC_ROUTINE_NOHOST'.
	(attr_bits, mio_symbol_attribute): Update.
	* openmp.c (omp_mask2): Add 'OMP_CLAUSE_NOHOST'.
	(gfc_match_omp_clauses): Handle 'OMP_CLAUSE_NOHOST'.
	(OACC_ROUTINE_CLAUSES): Add 'OMP_CLAUSE_NOHOST'.
	(gfc_match_oacc_routine): Update.
	* trans-decl.c (add_attributes_to_decl): Update.
	* trans-openmp.c (gfc_trans_omp_clauses): Likewise.
	gcc/testsuite/
	* c-c++-common/goacc/classify-routine-nohost.c: New file.
	* c-c++-common/goacc/classify-routine.c: Update.
	* c-c++-common/goacc/routine-2.c: Likewise.
	* c-c++-common/goacc/routine-nohost-1.c: New file.
	* c-c++-common/goacc/routine-nohost-2.c: Likewise.
	* g++.dg/goacc/template.C: Update.
	* gfortran.dg/goacc/classify-routine-nohost.f95: New file.
	* gfortran.dg/goacc/classify-routine.f95: Update.
	* gfortran.dg/goacc/pure-elemental-procedures-2.f90: Likewise.
	* gfortran.dg/goacc/routine-6.f90: Likewise.
	* gfortran.dg/goacc/routine-intrinsic-2.f: Likewise.
	* gfortran.dg/goacc/routine-module-1.f90: Likewise.
	* gfortran.dg/goacc/routine-module-2.f90: Likewise.
	* gfortran.dg/goacc/routine-module-3.f90: Likewise.
	* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
	* gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise.
	* gfortran.dg/goacc/routine-multiple-directives-2.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: New
	file.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2_2.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Likewise.

Co-Authored-By: Joseph Myers <joseph@codesourcery.com>
Co-Authored-By: Cesar Philippidis <cesar@codesourcery.com>
2021-07-21 23:58:11 +02:00
Thomas Schwinge
30656822b3 [GCN] Fix run-time variable 'num_workers'
... which currently has *not* been forced to 'num_workers (1)'.

In addition to the testcases modified here, this also fixes:

    FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/mode-transitions.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test
    [Etc.]

    mode-transitions.exe: [...]/libgomp.oacc-c-c++-common/mode-transitions.c:702: t17: Assertion `arr_b[i] == (i ^ 31) * 8' failed.

	libgomp/
	* plugin/plugin-gcn.c (gcn_exec): Force 'num_workers (1)'
	unconditionally.
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
2021-06-08 12:00:15 +02:00
Thomas Schwinge
c68ddd5e2a Enable more 'libgomp.oacc-*/lib-*' testcases for non-'openacc_nvidia_accel_selected'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/lib-11.c: Enable for all but
	'-DACC_MEM_SHARED=0'.
	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-15.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-20.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-23.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-24.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-34.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-42.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-44.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-48.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-88.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-89.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-92.c: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-14.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-5.c: Add
	'acc_device_radeon' testing.
	* testsuite/libgomp.oacc-c-c++-common/lib-6.c: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-7.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-52.c: Enable for all.
	* testsuite/libgomp.oacc-c-c++-common/lib-53.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-54.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-86.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-87.c: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-10.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/lib-8.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-57.c: Improve checking
	for non-'openacc_nvidia_accel_selected'.
	* testsuite/libgomp.oacc-c-c++-common/lib-58.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-62.c: Clarify that "Not
	all implement this checking".
	* testsuite/libgomp.oacc-c-c++-common/lib-63.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-64.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-65.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-67.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-68.c: Likewise.
2021-06-08 11:51:45 +02:00
Thomas Schwinge
32099c0d24 Fix 'libgomp.oacc-fortran/parallel-dims.f90' for 'acc_device_radeon'
..., by simplifying 'libgomp.oacc-c-c++-common/parallel-dims.c', and updating
the former correspondingly.  '__builtin_goacc_parlevel_id' does the right thing
for all 'acc_device_*'.

Follow-up to commit 09e0ad6253 "Update OpenACC
tests for amdgcn".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Simplify.
	* testsuite/libgomp.oacc-fortran/parallel-dims-aux.c: Update.
2021-06-08 11:41:52 +02:00
Thomas Schwinge
984df1e163 Fix 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c' for 'acc_device_radeon'
... on top of r279378 (commit 26b74ed022)
"Update OpenACC tests for amdgcn".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Fix
	for 'acc_device_radeon'.
2021-06-08 11:33:41 +02:00
Thomas Schwinge
292fb10beb Enhance 'libgomp.oacc-c-c++-common/firstprivate-1.c' for non-'acc_device_nvidia'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Enhance
	for non-'acc_device_nvidia'.
2021-06-08 11:31:49 +02:00
Thomas Schwinge
97a040e987 Add 'acc_device_radeon' testing to 'libgomp.oacc-*/acc_on_device-*'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Add
	'acc_device_radeon' testing.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
2021-06-08 11:28:53 +02:00
Thomas Schwinge
89c1a427a1 Don't require 'openacc_nvidia_accel_selected' in 'libgomp.oacc-c-c++-common/async_queue-1.c'
That is, re-enable it for host-fallback, and enable it for GCN offloading.

Fix-up for r279378 (commit 26b74ed022)
"Update OpenACC tests for amdgcn".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Don't
	require 'openacc_nvidia_accel_selected'.  Fix up for
	'ACC_DEVICE_TYPE_radeon'.
2021-06-08 11:23:31 +02:00
Thomas Schwinge
77f41a5c4e Don't require 'openacc_nvidia_accel_selected' in additional 'libgomp.oacc-*/declare-*'
Like r253779 (commit 92d5d01ac6)
"Enable libgomp.oacc-*/declare-*.{c,f90} for non-nvidia devices".

	libgomp/
	* testsuite/libgomp.oacc-c++/declare-1.C: Don't require
	'openacc_nvidia_accel_selected'.
	* testsuite/libgomp.oacc-c-c++-common/declare-3.c: Likewise.
2021-06-08 11:21:47 +02:00
Thomas Schwinge
0886426f5f Revert PR80547 workaround in 'libgomp.oacc-c-c++-common/parallel-dims.c'
This problem has been fixed long ago, in r267934 (commit
d41d952c9b) "[nvptx] Handle assignment to
gang-level reduction variable".

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Revert
	PR80547 workaround.
2021-06-08 11:10:55 +02:00
Thomas Schwinge
e64d62c700 [nvptx] Update comment in 'libgomp.oacc-c-c++-common/parallel-dims.c'
Small fix-up for r267889 (commit 2b9d9e3937)
"[nvptx] Enable large vectors":

> 	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Expect vector
> 	length 2097152 to be reduced to 1024 instead of 32.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
	<acc_device_nvidia>: Update comment.
2021-06-08 11:06:30 +02:00
Jakub Jelinek
79e3f7d54b libgomp: Add openacc_{cuda,cublas,cudart} effective targets and use them in openacc testsuite
When gcc is configured for nvptx offloading with --without-cuda-driver
and full CUDA isn't installed, many libgomp.oacc-*/* tests fail,
some of them because cuda.h header can't be found, others because
the tests can't be linked against -lcuda, -lcudart or -lcublas.
I usually only have akmod-nvidia and xorg-x11-drv-nvidia-cuda rpms
installed, so libcuda.so.1 can be dlopened and the offloading works,
but linking against those libraries isn't possible nor are the
headers around (for the plugin itself there is the fallback
libgomp/plugin/cuda/cuda.h).

The following patch adds 3 new effective targets and uses them in tests that
needs those.

2021-05-27  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/lib/libgomp.exp (check_effective_target_openacc_cuda,
	check_effective_target_openacc_cublas,
	check_effective_target_openacc_cudart): New.
	* testsuite/libgomp.oacc-fortran/host_data-4.f90: Require effective
	target openacc_cublas.
	* testsuite/libgomp.oacc-fortran/host_data-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/host_data-3.f: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-91.c: Require effective
	target openacc_cuda.
	* testsuite/libgomp.oacc-c-c++-common/lib-70.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-90.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-75.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-69.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-74.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-81.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-72.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr87835.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-82.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-73.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-83.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-78.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-76.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-84.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Require effective
	targets openacc_cublas and openacc_cudart.
	* testsuite/libgomp.oacc-c-c++-common/context-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/context-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/context-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/context-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-nvptx.c:
	Require effective target openacc_cudart.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Add -DUSE_CUDA_H
	for effective target openacc_cuda and add && defined USE_CUDA_H to
	preprocessor conditionals.  Guard -lcuda also on openacc_cuda
	effective target.
2021-05-27 22:44:36 +02:00
Thomas Schwinge
325aa13996 [OpenACC privatization] Reject 'static', 'external' in blocks [PR90115]
gcc/
	PR middle-end/90115
	* omp-low.c (oacc_privatization_candidate_p): Reject 'static',
	'external' in blocks.
	gcc/testsuite/
	PR middle-end/90115
	* c-c++-common/goacc/privatization-1-compute-loop.c: Update.
	* c-c++-common/goacc/privatization-1-compute.c: Likewise.
	* c-c++-common/goacc/privatization-1-routine_gang-loop.c:
	Likewise.
	* c-c++-common/goacc/privatization-1-routine_gang.c: Likewise.
	libgomp/
	PR middle-end/90115
	* testsuite/libgomp.oacc-c-c++-common/static-variable-1.c: Update.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
2021-05-21 20:23:34 +02:00
Thomas Schwinge
11b8286a83 [OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]
gcc/
	PR middle-end/90115
	* flag-types.h (enum openacc_privatization): New.
	* params.opt (-param=openacc-privatization): New.
	* doc/invoke.texi (openacc-privatization): Document it.
	* omp-general.h (get_openacc_privatization_dump_flags): New
	function.
	* omp-low.c (oacc_privatization_candidate_p): Add diagnostics.
	* omp-offload.c (execute_oacc_device_lower)
	<IFN_UNIQUE_OACC_PRIVATE>: Re-work diagnostics.
	* target.def (goacc.adjust_private_decl): Add 'location_t'
	parameter.
	* doc/tm.texi: Regenerate.
	* config/gcn/gcn-protos.h (gcn_goacc_adjust_private_decl): Adjust.
	* config/gcn/gcn-tree.c (gcn_goacc_adjust_private_decl): Likewise.
	* config/nvptx/nvptx.c (nvptx_goacc_adjust_private_decl):
	Likewise.  Preserve it for...
	(nvptx_goacc_expand_var_decl): ... use here.
	gcc/testsuite/
	PR middle-end/90115
	* c-c++-common/goacc/privatization-1-compute-loop.c: New file.
	* c-c++-common/goacc/privatization-1-compute.c: Likewise.
	* c-c++-common/goacc/privatization-1-routine_gang-loop.c:
	Likewise.
	* c-c++-common/goacc/privatization-1-routine_gang.c: Likewise.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
	* c-c++-common/goacc-gomp/nesting-1.c: Update.
	* c-c++-common/goacc/private-reduction-1.c: Likewise.
	* gfortran.dg/goacc/private-3.f95: Likewise.
	libgomp/
	PR middle-end/90115
	* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: New
	file.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Update.
	* testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/private-atomic-1-gang.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/static-variable-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
	* testsuite/libgomp.oacc-fortran/declare-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-gang.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.
2021-05-21 20:09:59 +02:00
Julian Brown
29a2f51806 openacc: Add support for gang local storage allocation in shared memory [PR90115]
This patch implements a method to track the "private-ness" of
OpenACC variables declared in offload regions in gang-partitioned,
worker-partitioned or vector-partitioned modes. Variables declared
implicitly in scoped blocks and those declared "private" on enclosing
directives (e.g. "acc parallel") are both handled. Variables that are
e.g. gang-private can then be adjusted so they reside in GPU shared
memory.

The reason for doing this is twofold: correct implementation of OpenACC
semantics, and optimisation, since shared memory might be faster than
the main memory on a GPU. Handling of private variables is intimately
tied to the execution model for gangs/workers/vectors implemented by
a particular target: for current targets, we use (or on mainline, will
soon use) a broadcasting/neutering scheme.

That is sufficient for code that e.g. sets a variable in worker-single
mode and expects to use the value in worker-partitioned mode. The
difficulty (semantics-wise) comes when the user wants to do something like
an atomic operation in worker-partitioned mode and expects a worker-single
(gang private) variable to be shared across each partitioned worker.
Forcing use of shared memory for such variables makes that work properly.

In terms of implementation, the parallelism level of a given loop is
not fixed until the oaccdevlow pass in the offload compiler, so the
patch delays fixing the parallelism level of variables declared on or
within such loops until the same point. This is done by adding a new
internal UNIQUE function (OACC_PRIVATE) that lists (the address of) each
private variable as an argument, and other arguments set so as to be able
to determine the correct parallelism level to use for the listed
variables. This new internal function fits into the existing scheme for
demarcating OpenACC loops, as described in comments in the patch.

Two new target hooks are introduced: TARGET_GOACC_ADJUST_PRIVATE_DECL and
TARGET_GOACC_EXPAND_VAR_DECL.  The first can tweak a variable declaration
at oaccdevlow time, and the second at expand time.  The first or both
of these target hooks can be used by a given offload target, depending
on its strategy for implementing private variables.

This patch updates the TARGET_GOACC_ADJUST_PRIVATE_DECL target hook in
the AMD GCN backend to the current name and prototype. (An earlier
version of the hook was already present, but dormant.)

	gcc/
	PR middle-end/90115
	* doc/tm.texi.in (TARGET_GOACC_EXPAND_VAR_DECL)
	(TARGET_GOACC_ADJUST_PRIVATE_DECL): Add documentation hooks.
	* doc/tm.texi: Regenerate.
	* expr.c (expand_expr_real_1): Expand decls using the
	expand_var_decl OpenACC hook if defined.
	* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
	* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
	* omp-low.c (omp_context): Add oacc_privatization_candidates
	field.
	(lower_oacc_reductions): Add PRIVATE_MARKER parameter.  Insert
	before fork.
	(lower_oacc_head_tail): Add PRIVATE_MARKER parameter.  Modify
	private marker's gimple call arguments, and pass it to
	lower_oacc_reductions.
	(oacc_privatization_scan_clause_chain)
	(oacc_privatization_scan_decl_chain, lower_oacc_private_marker):
	New functions.
	(lower_omp_for, lower_omp_target, lower_omp_1): Use these.
	* omp-offload.c (convert.h): Include.
	(oacc_loop_xform_head_tail): Treat private-variable markers like
	fork/join when transforming head/tail sequences.
	(struct var_decl_rewrite_info): Add struct.
	(oacc_rewrite_var_decl, is_sync_builtin_call): New functions.
	(execute_oacc_device_lower): Support rewriting gang-private
	variables using target hook, and fix up addr_expr and var_decl
	nodes afterwards.
	* target.def (adjust_private_decl, expand_var_decl): New hooks.
	* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl):
	Rename to...
	(gcn_goacc_adjust_private_decl): ...this.
	* config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl):
	Rename to...
	(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
	* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename
	definition using gcn_goacc_adjust_gangprivate_decl...
	(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...to this, using
	gcn_goacc_adjust_private_decl.
	* config/nvptx/nvptx.c (tree-pretty-print.h): Include.
	(gang_private_shared_size): New global variable.
	(gang_private_shared_align): Likewise.
	(gang_private_shared_sym): Likewise.
	(gang_private_shared_hmap): Likewise.
	(nvptx_option_override): Initialize these.
	(nvptx_file_end): Output gang_private_shared_sym.
	(nvptx_goacc_adjust_private_decl, nvptx_goacc_expand_var_decl):
	New functions.
	(nvptx_set_current_function): Clear gang_private_shared_hmap.
	(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook.
	(TARGET_GOACC_EXPAND_VAR_DECL): Likewise.
	libgomp/
	PR middle-end/90115
	* testsuite/libgomp.oacc-c-c++-common/private-atomic-1-gang.c: New
	test.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-gang.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90:
	Likewise.

Co-Authored-By: Chung-Lin Tang <cltang@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-05-21 18:58:07 +02:00
Thomas Schwinge
1467100fc7 Add 'libgomp.oacc-c-c++-common/private-atomic-1.c' [PR83812]
... to at least document/test/XFAIL nvptx offloading: PR83812 "operation not
supported on global/shared address space".

	libgomp/
	PR target/83812
	* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: New.
2021-05-19 14:23:29 +02:00
Julian Brown
5a16fb19e7 Add 'libgomp.oacc-c-c++-common/loop-gwv-2.c'
libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New.
2021-05-19 13:58:38 +02:00
Martin Liska
810afb0b5f testsuite: prune new LTO warning
libgomp/ChangeLog:

	PR testsuite/100569
	* testsuite/libgomp.c/omp-nested-3.c: Prune new LTO warning.
	* testsuite/libgomp.c/pr46032-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c: Likewise.

gcc/testsuite/ChangeLog:

	PR testsuite/100569
	* gcc.dg/atomic/c11-atomic-exec-2.c: Prune new LTO warning.
	* gcc.dg/torture/pr94947-1.c: Likewise.
2021-05-13 09:24:23 +02:00
Roman Zhuykov
4cf3b10f27 modulo-sched: skip loops with strange register defs [PR100225]
PR84878 fix adds an assertion which can fail, e.g. when stack pointer
is adjusted inside the loop.  We have to prevent it and search earlier
for any 'strange' instruction.  The solution is to skip the whole loop
if using 'note_stores' we found that one of hard registers is in
'df->regular_block_artificial_uses' set.

Also patch properly prohibit not single-set instruction in loop body.

gcc/ChangeLog:

	PR rtl-optimization/100225
	PR rtl-optimization/84878
	* modulo-sched.c (sms_schedule): Use note_stores to skip loops
	where we have an instruction which touches (writes) any hard
	register from df->regular_block_artificial_uses set.
	Allow not-single-set instruction only right before basic block
	tail.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/100225
	PR rtl-optimization/84878
	* gcc.dg/pr100225.c: New test.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c: New test.
2021-04-30 11:08:03 +03:00
Thomas Schwinge
22cff118f7 Add '-Wopenacc-parallelism'
... to diagnose potentially suboptimal choices regarding OpenACC parallelism.

Not enabled by default: too noisy ("*potentially* suboptimal choices"); see
XFAILed 'dg-bogus'es.

	gcc/c-family/
	* c.opt (Wopenacc-parallelism): New.
	gcc/fortran/
	* lang.opt (Wopenacc-parallelism): New.
	gcc/
	* omp-offload.c (oacc_validate_dims): Implement
	'-Wopenacc-parallelism'.
	* doc/invoke.texi (-Wopenacc-parallelism): Document.
	gcc/testsuite/
	* c-c++-common/goacc/diag-parallelism-1.c: New.
	* c-c++-common/goacc/acc-icf.c: Specify '-Wopenacc-parallelism',
	and match diagnostics, as appropriate.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/classify-parallel.c: Likewise.
	* c-c++-common/goacc/classify-routine.c: Likewise.
	* c-c++-common/goacc/classify-serial.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/parallel-dims-1.c: Likewise.
	* c-c++-common/goacc/parallel-reduction.c: Likewise.
	* c-c++-common/goacc/pr70688.c: Likewise.
	* c-c++-common/goacc/routine-1.c: Likewise.
	* c-c++-common/goacc/routine-level-of-parallelism-2.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	* gfortran.dg/goacc/classify-parallel.f95: Likewise.
	* gfortran.dg/goacc/classify-routine.f95: Likewise.
	* gfortran.dg/goacc/classify-serial.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/routine-4.f90: Likewise.
	* gfortran.dg/goacc/routine-level-of-parallelism-1.f90: Likewise.
	* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
	* gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise.
	* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Specify
	'-Wopenacc-parallelism', and match diagnostics, as appropriate.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/static-variable-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/par-reduction-2-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr84028.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.

Co-Authored-By: Nathan Sidwell <nathan@codesourcery.com>
Co-Authored-By: Tom de Vries <vries@codesourcery.com>
Co-Authored-By: Julian Brown <julian@codesourcery.com>
Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
2021-04-26 12:32:00 +02:00
Thomas Schwinge
7c640779bf [OpenACC] Don't compile libgomp testcases with '-w'
We'd like to actually catch compiler diagnostics (and currently there aren't
any).

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Don't
	compile with '-w'.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
2021-04-26 12:05:53 +02:00
Thomas Schwinge
3395dfc4da [OpenACC 'kernels'] '-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'
This configuration knob is temporary, and isn't really meant to be exposed to
users.

	gcc/
	* params.opt (-param=openacc-kernels=): Add.
	* omp-oacc-kernels-decompose.cc
	(pass_omp_oacc_kernels_decompose::gate): Use it.
	* doc/invoke.texi (-fopenacc-kernels=@var{mode}): Move...
	(--param): ... here, 'openacc-kernels'.
	gcc/c-family/
	* c.opt (fopenacc-kernels=): Remove.
	gcc/fortran/
	* lang.opt (fopenacc-kernels=): Remove.
	gcc/testsuite/
	* c-c++-common/goacc/if-clause-2.c: '-fopenacc-kernels=[...]' ->
	'--param=openacc-kernels=[...]'.
	* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	'-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
2021-04-19 14:29:48 +02:00
Hafiz Abid Qadeer
ac200799ac [OpenACC] Fix an ICE where a loop with GT condition is collapsed.
We have seen an ICE both on trunk and devel/omp/gcc-10 branches which can
be reprodued with this simple testcase.  It occurs if an OpenACC loop has
a collapse clause and any of the loop being collapsed uses GT or GE
condition.  This issue is specific to OpenACC.

int main (void)
{
  int ix, iy;
  int dim_x = 16, dim_y = 16;
  {
       for (iy = dim_y - 1; iy > 0; --iy)
       for (ix = dim_x - 1; ix > 0; --ix)
        ;
  }
}

The problem is caused by a failing assertion in expand_oacc_collapse_init.
It checks that cond_code for fd->loop should be same as cond_code for all
the loops that are being collapsed.  As the cond_code for fd->loop is
LT_EXPR with collapse clause (set at the end of omp_extract_for_data),
this assertion forces that all the loop in collapse clause should use
< operator.

There does not seem to be anything in the code which demands this
condition as loop with > condition works ok otherwise.  I digged old
mailing list a bit but could not find any discussion on this change.
Looking at the code, expand_oacc_for checks that fd->loop->cond_code is
either LT_EXPR or GT_EXPR.  I guess the original intention was to have
similar checks on the loop which are being collapsed. But the way check
was written does not acheive that.

I have fixed it by modifying the check in the assertion to be same as
check on fd->loop->cond_code.

I tested goacc and libgomp (with nvptx offloading) and did not see any
regression.  I have added new tests to check collapse with GT/GE condition.

	PR middle-end/98088
	gcc/
	* omp-expand.c (expand_oacc_collapse_init): Update condition in
	a gcc_assert.

	gcc/testsuite/
	* c-c++-common/goacc/collapse-2.c: New.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Add check
	for loop with GT/GE condition.
	* testsuite/libgomp.oacc-c-c++-common/collapse-3.c: Likewise.
2021-04-11 14:44:22 +01:00
Thomas Schwinge
ffa0ae6eee Add 'libgomp.oacc-c-c++-common/static-variable-1.c' [PR84991, PR84992, PR90779]
libgomp/
	PR middle-end/84991
	PR middle-end/84992
	PR middle-end/90779
	* testsuite/libgomp.oacc-c-c++-common/static-variable-1.c: New.
2021-04-09 17:28:32 +02:00
Thomas Schwinge
0cab70604c Fix templatized C++ OpenACC 'cache' directive ICEs
This has been broken forever, whoops...

	gcc/cp/
	* pt.c (tsubst_omp_clauses): Handle 'OMP_CLAUSE__CACHE_'.
	(tsubst_expr): Handle 'OACC_CACHE'.
	gcc/testsuite/
	* c-c++-common/goacc/cache-1.c: Update.
	* c-c++-common/goacc/cache-2.c: Likewise.
	* g++.dg/goacc/cache-1.C: New.
	* g++.dg/goacc/cache-2.C: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c++/cache-1.C: New.
	* testsuite/libgomp.oacc-c-c++-common/cache-1.c: Update.
2020-11-25 19:57:39 +01:00
Thomas Schwinge
f72175357d [testsuite] Avoid Tcl 8.5-specific behavior
gcc/
	* doc/install.texi (Prerequisites) <Tcl>: Add comment.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: Avoid Tcl 8.5-specific
	behavior.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Avoid
	Tcl 8.5-specific behavior.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

Reported-by: David Edelsohn <dje.gcc@gmail.com>
2020-11-24 10:29:35 +01:00
Gergö Barany
e898ce7997 Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs
Not yet enabled by default: for now, the current mode of OpenACC 'kernels'
constructs handling still remains '-fopenacc-kernels=parloops', but that is to
change later.

	gcc/
	* omp-oacc-kernels-decompose.cc: New.
	* Makefile.in (OBJS): Add it.
	* passes.def: Instantiate it.
	* tree-pass.h (make_pass_omp_oacc_kernels_decompose): Declare.
	* flag-types.h (enum openacc_kernels): Add.
	* doc/invoke.texi (-fopenacc-kernels): Document.
	* gimple.h (enum gf_mask): Add
	'GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED',
	'GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE',
	'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'.
	(is_gimple_omp_oacc, is_gimple_omp_offloaded): Handle these.
	* gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
	* omp-expand.c (expand_omp_target, build_omp_regions_1)
	(omp_make_gimple_edges): Likewise.
	* omp-low.c (scan_sharing_clauses, scan_omp_for)
	(check_omp_nesting_restrictions, lower_oacc_reductions)
	(lower_oacc_head_mark, lower_omp_target): Likewise.
	* omp-offload.c (execute_oacc_device_lower): Likewise.
	gcc/c-family/
	* c.opt (fopenacc-kernels): Add.
	gcc/fortran/
	* lang.opt (fopenacc-kernels): Add.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: New.
	* c-c++-common/goacc/kernels-decompose-2.c: New.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: New.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: New.
	* gfortran.dg/goacc/kernels-decompose-1.f95: New.
	* gfortran.dg/goacc/kernels-decompose-2.f95: New.
	* c-c++-common/goacc/if-clause-2.c: Adjust.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	New.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2020-11-13 22:58:57 +01:00
Thomas Schwinge
79680c1d5c Simplify and enhance 'libgomp.oacc-c-c++-common/pr85486*.c' [PR85486]
Avoid code duplication, and better test what we expect to happen.

	libgomp/
	PR target/85486
	* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Simplify and enhance.
	* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
2020-11-02 14:20:01 +01:00
Tom de Vries
3f2e15c2e6 [openacc] Fix acc declare for VLAs
Consider test-case test.c, with VLA A:
...
int main (void) {
  int N = 1000;
  int A[N];
  #pragma acc declare copy(A)
  return 0;
}
...
compiled using:
...
$ gcc test.c -fopenacc -S -fdump-tree-all
...

At original, we have:
...
  #pragma acc declare map(tofrom:A);
...
but at gimple, we have a map (to:A.1), but not a map (from:A.1):
...
  int[0:D.2074] * A.1;

  {
    int A[0:D.2074] [value-expr: *A.1];

    saved_stack.2 = __builtin_stack_save ();
    try
      {
        A.1 = __builtin_alloca_with_align (D.2078, 32);
        #pragma omp target oacc_declare map(to:(*A.1) [len: D.2076])
      }
    finally
      {
        __builtin_stack_restore (saved_stack.2);
      }
  }
...

This is caused by the following incompatibility.  When storing the desired
from clause in oacc_declare_returns, we use 'A.1' as the key:
...
10898                 oacc_declare_returns->put (decl, c);
(gdb) call debug_generic_expr (decl)
A.1
(gdb) call debug_generic_expr (c)
map(from:(*A.1))
...
but when looking it up, we use 'A' as the key:
...
(gdb)
1471                  tree *c = oacc_declare_returns->get (t);
(gdb) call debug_generic_expr (t)
A
...

Fix this by extracing the 'A.1' lookup key from 'A' using the decl-expr.

In addition, unshare the looked up value, to fix avoid running into
an "incorrect sharing of tree nodes" error.

Using these two fixes, we get our desired:
...
     finally
       {
+        #pragma omp target oacc_declare map(from:(*A.1))
         __builtin_stack_restore (saved_stack.2);
       }
...

Build on x86_64-linux with nvptx accelerator, tested libgomp.

gcc/ChangeLog:

2020-10-06  Tom de Vries  <tdevries@suse.de>

	PR middle-end/90861
	* gimplify.c (gimplify_bind_expr): Handle lookup in
	oacc_declare_returns using key with decl-expr.

libgomp/ChangeLog:

2020-10-06  Tom de Vries  <tdevries@suse.de>

	PR middle-end/90861
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Remove xfail.
2020-10-06 16:50:22 +02:00