Add some test-cases that set vector length using -fopenacc-dim.
2019-01-12 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: New test.
* testsuite/libgomp.oacc-fortran/gemm-2.f90: New test.
From-SVN: r267897
Add some test-cases using vector_length 64.
2019-01-12 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-64-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-64-3.c: New test.
From-SVN: r267895
Add test-cases for "[nvptx] Force vl32 if calling vector-partitionable
routines".
2019-01-12 Tom de Vries <tdevries@suse.de>
PR target/85486
* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: New test.
From-SVN: r267894
Add test-cases for PR85381.
2019-01-12 Tom de Vries <tdevries@suse.de>
PR target/85381
* testsuite/libgomp.oacc-c-c++-common/pr85381-5.c: New test.
* testsuite/libgomp.oacc-c-c++-common/pr85381.c: New test.
From-SVN: r267893
Add various reduction test-cases with vector length 128.
2019-01-12 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: New test.
* testsuite/libgomp.oacc-fortran/gemm.f90: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-10.c: New test.
From-SVN: r267892
Add various test-cases with vector length 128.
2019-01-12 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: New test.
From-SVN: r267891
Update message in nvptx libgomp plugin about insufficient resources to launch
kernel, to accommodate for the fact the vector_length can now be variable.
2019-01-12 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (nvptx_exec): Update insufficient hardware
resources diagnostic.
From-SVN: r267890
Allow vector_length clauses to accept values larger than warp size. Note that
this does not enable setting vector_length to values larger than warp size using
-fopenacc-dim.
2019-01-12 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Take larger vector
lengths into account.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Expect
vector length to be 128.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Expect vector
length 2097152 to be reduced to 1024 instead of 32.
From-SVN: r267889
When using a compiler build with:
...
+#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE
...
consider a test-case:
...
int
main (void)
{
#pragma acc parallel vector_length (64)
#pragma acc loop worker
for (unsigned int i = 0; i < 32; i++)
#pragma acc loop vector
for (unsigned int j = 0; j < 64; j++)
;
return 0;
}
...
If num_workers is 16, either because:
- we add a "num_workers (16)" clause on the parallel directive, or
- we set "GOMP_OPENACC_DIM=:16:", or
- the libgomp plugin chooses 16 num_workers
we run into an illegal instruction at runtime, because a bar.sync instruction
tries to use a barrier 16. The instruction is illegal, because ptx supports
only 16 barriers per CTA, and the valid range is 0..15.
The problem is that with a warp-multiple vector length, we use a code generation
scheme with a per-worker barrier. And because barrier zero is reserved for
per-cta barrier, only the remaining 15 barriers can be used as per-worker
barrier, and consequently we can't use num_workers larger than 15.
This problem occurs only for vector_length 64. For vector_length 32, we use a
different code generation scheme, and for vector_length >= 96, the maximum
num_workers is not big enough not to trigger this problem.
Also, this problem only occurs for num_workers 16. As explained above,
num_workers 15 is safe to use, and 16 is already the maximum num_workers for
vector_length 64.
This patch fixes the problem in both the compiler (handling "num_workers (16)")
and in the libgomp nvptx plugin (with and without "GOMP_OPENACC_DIM=:16:").
2019-01-11 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (PTX_CTA_NUM_BARRIERS, PTX_PER_CTA_BARRIER)
(PTX_NUM_PER_CTA_BARRIER, PTX_FIRST_PER_WORKER_BARRIER)
(PTX_NUM_PER_WORKER_BARRIERS): Define.
(nvptx_apply_dim_limits): Prevent vector_length 64 and
num_workers 16.
* plugin/plugin-nvptx.c (nvptx_exec): Prevent vector_length 64 and
num_workers 16.
From-SVN: r267838
Before the commit "[libgomp, testsuite, openacc] Don't use const int for
dimensions", the "const int" construct was used to set launch dimensions in
reductions-[1-5].c. In the case of -xc -O0, the const int is implemented as a
variable by the C front-end. Consequently, the nvptx back-end generated
warnings that vector_length was overridden to be hard-coded, rather than left to
be set at runtime. The test-cases silenced these warnings by switching off all
warnings in the accelerator compiler using "-foffload=-w".
Given that no warnings occur anymore, remove the "-foffload=-w" setting.
2019-01-11 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Remove
-foffload=-w.
* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Same.
From-SVN: r267836
Add a test-case that tests the "insufficient resources" fatal in the nvptx
libgomp plugin.
2019-01-11 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/insufficient-resources.c: New
test.
From-SVN: r267835
libgomp/
* config/rtems/affinity-fmt.c: New file. Include affinity-fmt.c,
undefining HAVE_GETPID and HAVE_GETHOSTNAME, and mapping fwrite to
write.
From-SVN: r267751
Const int is handled differently at -O0 for -xc and -xc++, which can cause noise
in testsuite/libgomp.oacc-c-c++-common test-cases (which are both run for c and
c++) if const int is used for launch dimensions.
Fix this by using #defines instead.
2019-01-09 Tom de Vries <tdevries@suse.de>
PR target/88756
* testsuite/libgomp.oacc-c-c++-common/reduction-1.c (ng, nw, vl): Use
#define instead of "const int".
* testsuite/libgomp.oacc-c-c++-common/reduction-2.c (ng, nw, vl): Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-3.c (ng, nw, vl): Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-4.c (ng, nw, vl): Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c (ng, nw, vl): Same.
From-SVN: r267747
When using a compiler build with:
...
+#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE
+#define PTX_MAX_VECTOR_LENGTH PTX_CTA_SIZE
...
and running the libgomp testsuite, we run into an execution failure in
parallel-loop-1.c, due to a cuda launch failure:
...
nvptx_exec: kernel f6_none_none$_omp_fn$0: launch gangs=480, workers=0, \
vectors=1024
libgomp: cuLaunchKernel error: invalid argument
...
because workers == 0.
The workers variable is set to 0 here in nvptx_exec:
...
workers = blocks / actual_vectors;
...
because actual_vectors is 1024, and blocks is 768:
...
cuOccupancyMaxPotentialBlockSize: grid = 10, block = 768
...
Fix this by ensuring that workers is at least one.
2019-01-09 Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (nvptx_exec): Make sure to launch with at least
one worker.
From-SVN: r267746
The vector-length-128-3.c test-case uses GOMP_OPENACC_DIM=-:-:128, but '-' is
not yet supported on trunk. Use GOMP_OPENACC_DIM=::128 instead.
2019-01-07 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Fix
GOMP_OPENACC_DIM argument.
From-SVN: r267624
Add a couple of test-cases using vector length 128, while checking that we
override to vector length 32.
2019-01-03 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: New test.
From-SVN: r267559
... so that we're then able to use this for other flags in addition to
"GOACC_FLAG_HOST_FALLBACK".
gcc/
* omp-expand.c (expand_omp_target): Restructure OpenACC vs. OpenMP
code paths. Update for libgomp OpenACC entry points change.
include/
* gomp-constants.h (GOACC_FLAG_HOST_FALLBACK)
(GOACC_FLAGS_MARSHAL_OP, GOACC_FLAGS_UNMARSHAL): Define.
libgomp/
* oacc-parallel.c (GOACC_parallel_keyed, GOACC_parallel)
(GOACC_data_start, GOACC_enter_exit_data, GOACC_update)
(GOACC_declare): Redefine the "device" argument to "flags".
From-SVN: r267448
libgomp/
* target.c (struct gomp_coalesce_chunk): New structure.
(struct gomp_coalesce_buf): Update the chunks member to use that
type. Adjust all users.
Co-Authored-By: Cesar Philippidis <cesar@codesourcery.com>
From-SVN: r267446
2018-12-27 Steven G. Kargl <kargl@gcc.gnu.org>
* libgomp.fortran/aligned1.f03: Fix invalid code that now causes
an error after r267415.
From-SVN: r267436
Add pr85381*.c test-cases that are already passing without the fix for PR85381.
Build and reg-tested on x86_64 with nvptx accelerator.
2018-12-19 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: New test.
From-SVN: r267268
The goacc.exp test-cases nvptx-merged-loop.c and nvptx-sese-1.c are failing
during linking due to missing libgomp.spec.
Move them to the libgomp testsuite.
Build and reg-tested on x86_64 with nvptx accelerator.
2018-12-19 Tom de Vries <tdevries@suse.de>
* gcc.dg/goacc/nvptx-merged-loop.c: Move to
libgomp/testsuite/libgomp.oacc-c-c++-common.
* gcc.dg/goacc/nvptx-sese-1.c: Same.
* testsuite/lib/libgomp.exp: Add load_lib of scanoffloadrtl.exp.
* testsuite/libgomp.oacc-c-c++-common/nvptx-merged-loop.c: Move from
gcc/testsuite/gcc.dg/goacc.
* testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c: Same.
From-SVN: r267267
Most of that patch's changes were already committed as part of r261813 "Update
OpenACC data clause semantics to the 2.5 behavior", but not all of them.
libgomp/
* oacc-mem.c (acc_present_or_create): Remove definition and change
to alias of acc_create.
(acc_present_or_copyin): Remove definition and change to alias of
acc_copyin.
* oacc-parallel.c (GOACC_enter_exit_data): Call acc_create instead
of acc_present_or_create.
* testsuite/libgomp.oacc-c-c++-common/data-already-1.c: Remove.
* testsuite/libgomp.oacc-c-c++-common/data-already-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-6.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-7.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-8.c: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-4.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-5.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-6.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-7.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-8.f: Likewise.
Co-Authored-By: Chung-Lin Tang <cltang@codesourcery.com>
From-SVN: r267153
An OpenACC async queue is always synchronized with itself, so invocations like
"#pragma acc wait(0) async(0)", or "acc_wait_async (0, 0)" don't make a lot of
sense, but are still valid.
libgomp/
PR libgomp/88495
* plugin/plugin-nvptx.c (nvptx_wait_async): Don't refuse
"identical parameters".
* testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: Update.
* testsuite/libgomp.oacc-c-c++-common/lib-80.c: Remove.
From-SVN: r267152
We don't correctly handle "#pragma acc wait async (a)" for "a >= 0", handling
as a no-op whereas it should enqueue the appropriate wait operations on
"async (a)".
libgomp/
PR libgomp/88484
* oacc-parallel.c (GOACC_wait): Correct handling for "async >= 0".
* testsuite/libgomp.oacc-c-c++-common/asyncwait-nop-1.c: New file.
From-SVN: r267151
These are meant to be functionally equivalent (but no longer are), just using
different means. Also, use the OpenACC "*_async" functions recently added.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Revise.
* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
From-SVN: r267149
Per my reading of the OpenACC specification (and as supported by secondary
documentation, such as code examples, or presentations), it's valid to call
"acc_get_cuda_stream"/"acc_set_cuda_stream" also with "acc_async_sync",
"acc_async_noval" arguments, not just with the nonnegative values as currently
implemented.
libgomp/
PR libgomp/88370
* libgomp.texi (acc_get_current_cuda_context, acc_get_cuda_stream)
(acc_set_cuda_stream): Clarify.
* oacc-cuda.c (acc_get_cuda_stream, acc_set_cuda_stream): Use
"async_valid_p".
* plugin/plugin-nvptx.c (nvptx_set_cuda_stream): Refuse "async ==
acc_async_sync".
* testsuite/libgomp.oacc-c-c++-common/acc_set_cuda_stream-1.c: New file.
* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-84.c: Update.
* testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.
From-SVN: r267147
When compiling an OpenMP or OpenACC program containing a reference in the
offloaded code to a symbol that has not been included in the offloaded code,
the offloading compiler may ICE in lto1.
Fix this by erroring out instead, mentioning the problematic symbol:
...
error: variable 'var' has been referenced in offloaded code but hasn't
been marked to be included in the offloaded code
lto1: fatal error: errors during merging of translation units
compilation terminated.
...
Build x86_64 with nvptx accelerator and reg-tested libgomp.
Build x86_64 and reg-tested libgomp.
2018-12-14 Tom de Vries <tdevries@suse.de>
* lto-cgraph.c (verify_node_partition): New function.
(input_overwrite_node, input_varpool_node): Use verify_node_partition.
* testsuite/libgomp.c-c++-common/function-not-offloaded-aux.c: New test.
* testsuite/libgomp.c-c++-common/function-not-offloaded.c: New test.
* testsuite/libgomp.c-c++-common/variable-not-offloaded.c: New test.
* testsuite/libgomp.oacc-c-c++-common/function-not-offloaded.c: New test.
* testsuite/libgomp.oacc-c-c++-common/variable-not-offloaded.c: New test.
From-SVN: r267134
Libgomp test-case libgomp.c/target-5.c is failing to compile when building for
x86_64 with nvptx accelerator due to missing:
- getpid
- gethostname
- isatty (pulled in by fwrite)
in the nvptx newlib.
This patch fixes the build failure by:
- adding a function gomp_print_string which limits the use of fwrite to a single
location (in affinity-fmt.c), and
- creating an nvptx version of affinity-fmt.c, which:
- overrides the configure test results HAVE_GETPID and HAVE_GETHOSTNAME, and
- implements fwrite using write.
Build and reg-tested on x86_64 with nvptx accelerator.
2018-12-13 Tom de Vries <tdevries@suse.de>
* affinity-fmt.c (gomp_print_string): New function, factored out of ...
(omp_display_affinity, gomp_display_affinity_thread): ... here, and ...
* fortran.c (omp_display_affinity_): ... here.
* libgomp.h (gomp_print_string): Declare.
* config/nvptx/affinity-fmt.c: New file. Include affinity-fmt.c,
undefining HAVE_GETPID and HAVE_GETHOSTNAME, and mapping fwrite to
write.
From-SVN: r267100
PR libgomp/88460
* testsuite/libgomp.c++/for-24.C (results): Include it in
omp declare target region.
(main): Use map (always, tofrom: results) instead of
map (tofrom: results).
From-SVN: r267093
PR fortran/88463
* trans-openmp.c (gfc_omp_predetermined_sharing): Handle TREE_READONLY
VAR_DECLs with DECL_EXTERNAL like those with TREE_STATIC.
* testsuite/libgomp.fortran/pr88463-1.f90: New test.
* testsuite/libgomp.fortran/pr88463-2.f90: New test.
From-SVN: r267069
2018-12-09 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/88411
* io/transfer.c (dta_transfer_init): Do not treat as an
asynchronous statement unless the statement has
ASYNCHRONOUS="YES".
(st_write_done): Likewise.
(st_read_done): Do not perform async_wait for synchronous I/O
on an async unit.
(st_read_done): Likewise.
2018-12-09 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/88411
* testsuite/libgomp.fortran/async_io_8.f90: New test.
From-SVN: r266929
PR libgomp/87995
* testsuite/libgomp.c-c++-common/cancel-taskgroup-3.c: Require
tls_runtime effective target.
(t): New threadprivate variable.
(main): Set t in threads which execute iterations of the worksharing
loop. Propagate that to the task after the loop and don't abort
if the current taskgroup hasn't been cancelled.
From-SVN: r266904
* omp-low.c (check_omp_nesting_restrictions): Allow cancel or
cancellation point with taskgroup clause inside of taskloop. Consider
a taskloop construct without nogroup clause as implicit taskgroup for
diagnostics if cancel/cancellation point with taskgroup clause is
closely nested inside of taskgroup region.
* c-c++-common/gomp/cancel-1.c (f2): Add various taskloop related
tests.
* testsuite/libgomp.c-c++-common/cancel-taskgroup-4.c: New test.
From-SVN: r266722
Make libgomp respect the on device offset of subarrays which may arise in
present data clauses.
libgomp/
PR libgomp/88288
* oacc-parallel.c (GOACC_parallel_keyed): Add offset to devaddrs.
* testsuite/libgomp.oacc-c-c++-common/pr88288.c: New test.
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>
From-SVN: r266688
This is a copy of libgomp.oacc-fortran/lib-16.f90, but does 'include
"openacc_lib.h"' instead of 'use openacc'.
libgomp/
* testsuite/libgomp.oacc-fortran/lib-16-2.f90: New file.
From-SVN: r266683
2018-10-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/88182
* g++.dg/gomp/pr88182.C: Move from libgomp and use -fopenmp-simd.
* testsuite/libgomp.c++/pr88182.C: Move to g++.dg/gomp.
From-SVN: r266545
* testsuite/Makefile.am (AUTOMAKE_OPTIONS): Drop dejagnu.
(RUNTEST): Don't define.
(RUNTESTDEFAULTFLAGS): Add.
(check-DEJAGNU, site.exp, distclean-DEJAGNU): New goals.
(distclean-am): Depend on distclean-DEJAGNU.
(check-am): If -j% option is present in MFLAGS and if
`getconf _NPROCESSORS_ONLN` is more than 8, export OMP_NUM_THREADS=8.
(.PHONY): Add check-DEJAGNU and distclean-DEJAGNU.
* testsuite/Makefile.in: Regenerated.
From-SVN: r266482