gcc/libgomp
Tom de Vries ab3f4b27ab [omp, ftracer] Don't duplicate blocks in SIMT region
When running the libgomp testsuite on x86_64-linux with nvptx accelerator on
the test-case included in this patch, we run into:
...
FAIL: libgomp.fortran/pr95654.f90 -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...

The test-case is a minimal version of this FAIL:
...
FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...
but that one has stopped failing at commit c2ebf4f10d "openmp: Add support
for non-rect simd and improve collapsed simd support".

The problem is that ftracer duplicates a block containing GOMP_SIMT_VOTE_ANY.

That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
...
bb4(ENTER_ALLOC)
*----------+
|           \
|            \
|             v
|             *
v             bb8
*<------------*
bb5(VOTE_ANY)
*-------------+
|             |
|             |
|             |
|             |
|             v
|             *
v             bb7(XCHG_IDX)
*<------------*
bb6(EXIT)
...

The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for nvptx
maps onto shfl, an operator which has the requirement that the warp executing
the operator is convergent.  The warp diverges at bb4, and
reconverges at bb5, and does not diverge by going to bb7, so the shfl is
indeed executed by a convergent warp.

After ftracer, we have:
...
bb4(ENTER_ALLOC)
*----------+
|           \
|            \
|             \
|              \
v               v
*               *
bb5(VOTE_ANY)   bb8(VOTE_ANY)
*               *
|\             /|
| \  +--------+ |
|  \/           |
|  /\           |
| /  +----------v
|/              *
v               bb7(XCHG_IDX)
*<--------------*
bb6(EXIT)
...

The warp diverges again at bb5, but does not reconverge again before bb6, so
the shfl is executed by a divergent warp, which causes the FAIL.

Fix this by making ftracer ignore blocks containing ENTER_ALLOC, VOTE_ANY and
EXIT, effectively treating the SIMT region conservatively.

An argument can be made that the test needs to be added in a more
generic place, like gimple_can_duplicate_bb_p or some such, and that ftracer
then needs to use the generic test.  But that's a discussion with a much
broader scope, so I'm leaving that for another patch.

Bootstrapped and reg-tested on x86_64-linux.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

	PR fortran/95654
	* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC,
	GOMP_SIMT_VOTE_ANY and GOMP_SIMT_EXIT.

libgomp/ChangeLog:

2020-10-05  Tom de Vries  <tdevries@suse.de>

	PR fortran/95654
	* testsuite/libgomp.fortran/pr95654.f90: New test.
2020-10-05 08:53:11 +02:00
..
config libgomp: disable barriers in nested teams 2020-09-29 11:48:04 +01:00
plugin [libgomp, nvptx] Print error log for link error 2020-09-22 13:38:00 +02:00
testsuite [omp, ftracer] Don't duplicate blocks in SIMT region 2020-10-05 08:53:11 +02:00
.gitattributes libgomp: Fixes + cleanup for OpenACC's Fortran module + openacc_lib.h 2020-02-19 09:13:44 +01:00
acc_prof.h
acinclude.m4
aclocal.m4 libgomp: Regenerate configure files with automake 1.15.1 2020-10-02 12:08:47 +02:00
affinity-fmt.c re PR libgomp/93219 (unused return value in affinity-fmt.c) 2020-01-10 21:42:00 +01:00
affinity.c
alloc.c
allocator.c libgomp: Add Fortran routine support for allocators 2020-07-15 08:33:20 +02:00
atomic.c
barrier.c
ChangeLog Daily bump. 2020-10-03 00:16:25 +00:00
ChangeLog.graphite
config.h.in Removal of HSA offloading from gcc and libgomp 2020-08-03 18:13:00 +02:00
configure libgomp: Regenerate configure files with automake 1.15.1 2020-10-02 12:08:47 +02:00
configure.ac libomp: Add omp_depend_kind to omp_lib.{f90,h} 2020-07-23 15:02:15 +02:00
configure.tgt aix: Add GCC64 configuration and FAT target libraries. 2020-06-21 14:14:46 -04:00
critical.c
env.c openmp: Add basic library allocator support. 2020-05-19 10:11:01 +02:00
error.c
fortran.c libgomp: Add Fortran routine support for allocators 2020-07-15 08:33:20 +02:00
hashtab.h
icv-device.c
icv.c libgomp: Add Fortran routine support for allocators 2020-07-15 08:33:20 +02:00
iter_ull.c
iter.c
libgomp_f.h.in libomp: Add omp_depend_kind to omp_lib.{f90,h} 2020-07-23 15:02:15 +02:00
libgomp_g.h
libgomp-plugin.c
libgomp-plugin.h OpenACC 'acc_get_property' cleanup 2020-01-10 23:24:36 +01:00
libgomp.h OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668] 2020-09-15 09:24:47 +02:00
libgomp.map libgomp: Add Fortran routine support for allocators 2020-07-15 08:33:20 +02:00
libgomp.spec.in
libgomp.texi libgomp: Fix hang when profiling OpenACC programs with CUDA 9.0 nvprof 2020-07-14 10:31:35 -07:00
lock.c
loop_ull.c
loop.c
Makefile.am build: Use -include instead of conditional include. 2020-06-22 21:31:48 +00:00
Makefile.in libgomp: Regenerate configure files with automake 1.15.1 2020-10-02 12:08:47 +02:00
oacc-async.c
oacc-cuda.c
oacc-host.c OpenACC 'acc_get_property' cleanup 2020-01-10 23:24:36 +01:00
oacc-init.c libgomp: Fix hang when profiling OpenACC programs with CUDA 9.0 nvprof 2020-07-14 10:31:35 -07:00
oacc-int.h
oacc-mem.c openacc: Deep copy attach/detach should not affect reference counts 2020-07-27 09:16:57 -07:00
oacc-parallel.c
oacc-plugin.c
oacc-plugin.h
oacc-profiling.c
oacc-target.c
omp_lib.f90.in libomp: Add omp_depend_kind to omp_lib.{f90,h} 2020-07-23 15:02:15 +02:00
omp_lib.h.in libomp: Add omp_depend_kind to omp_lib.{f90,h} 2020-07-23 15:02:15 +02:00
omp.h.in openmp: Change omp_atv_default value and rename omp_atv_sequential to omp_atv_serialized. 2020-07-09 11:29:30 +02:00
openacc_lib.h [OpenACC] Set 'acc_device_current = -1' 2020-04-29 09:54:37 +02:00
openacc.f90 [OpenACC] Set 'acc_device_current = -1' 2020-04-29 09:54:37 +02:00
openacc.h [OpenACC] Set 'acc_device_current = -1' 2020-04-29 09:54:37 +02:00
ordered.c
parallel.c libgomp: Enforce 1-thread limit in subteams 2020-09-30 17:37:31 +01:00
priority_queue.c
priority_queue.h
sections.c
secure_getenv.h
single.c
splay-tree.c
splay-tree.h
target.c libgomp/target.c: Silence -Wuninitialized warning 2020-09-15 21:28:40 +02:00
task.c
taskloop.c
team.c openmp: Add basic library allocator support. 2020-05-19 10:11:01 +02:00
teams.c
work.c