fc14ff6111
Consider the test-case libgomp.c/pr81778.c added in this commit, with this core loop (note: CANARY_SIZE set to 0 for simplicity): ... int s = 1; #pragma omp target simd for (int i = N - 1; i > -1; i -= s) a[i] = 1; ... which, given that N is 32, sets a[0..31] to 1. After omp-expand, this looks like: ... <bb 5> : simduid.7 = .GOMP_SIMT_ENTER (simduid.7); .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7); D.3193 = -s; s.9 = s; D.3204 = .GOMP_SIMT_LANE (); D.3205 = -s.9; D.3206 = (int) D.3204; D.3207 = D.3205 * D.3206; i = D.3207 + 31; D.3209 = 0; D.3210 = -s.9; D.3211 = D.3210 - i; D.3210 = -s.9; D.3212 = D.3211 / D.3210; D.3213 = (unsigned int) D.3212; D.3213 = i >= 0 ? D.3213 : 0; <bb 19> : if (D.3209 < D.3213) goto <bb 6>; [87.50%] else goto <bb 7>; [12.50%] <bb 6> : a[i] = 1; D.3215 = -s.9; D.3219 = .GOMP_SIMT_VF (); D.3216 = (int) D.3219; D.3220 = D.3215 * D.3216; i = D.3220 + i; D.3209 = D.3209 + 1; goto <bb 19>; [100.00%] ... On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending on the lane that is executing) at bb entry. So we have the following sequence: - a[0..31] is set to 1 - i is updated to -32..-1 - D.3209 is updated to 1 (being 0 initially) - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates to true - bb6 is once more executed, which should not happen because all the elements that needed to be handled were already handled. - consequently, elements that should not be written are written - with CANARY_SIZE == 0, we may run into a libgomp error: ... libgomp: cuCtxSynchronize error: an illegal memory access was encountered ... and with CANARY_SIZE unmodified, we run into: ... Expected 0, got 1 at base[-961] Aborted (core dumped) ... The cause of this is as follows: - because the step s is a variable rather than a constant, an alternative IV (D.3209 in our example) is generated in expand_omp_simd, and the loop condition is tested in terms of the alternative IV rather than the original IV (i in our example). - the SIMT code in expand_omp_simd works by modifying step and initial value. - The initial value fd->loop.n1 is loaded into a variable n1, which is modified by the SIMT code and then used there-after. - The step fd->loop.step is loaded into a variable step, which is modified by the SIMT code, but afterwards there are uses of both step and fd->loop.step. - There are uses of fd->loop.step in the alternative IV handling code, which should use step instead. Fix this by introducing an additional variable orig_step, which is not modified by the SIMT code and replacing all remaining uses of fd->loop.step by either step or orig_step. Build on x86_64-linux with nvptx accelerator, tested libgomp. This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200 with driver 450.66. gcc/ChangeLog: 2020-10-02 Tom de Vries <tdevries@suse.de> * omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of fd->loop.step by either step or orig_step. libgomp/ChangeLog: 2020-10-02 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/pr81778.c: New test. |
||
---|---|---|
.. | ||
config | ||
plugin | ||
testsuite | ||
.gitattributes | ||
acc_prof.h | ||
acinclude.m4 | ||
aclocal.m4 | ||
affinity-fmt.c | ||
affinity.c | ||
alloc.c | ||
allocator.c | ||
atomic.c | ||
barrier.c | ||
ChangeLog | ||
ChangeLog.graphite | ||
config.h.in | ||
configure | ||
configure.ac | ||
configure.tgt | ||
critical.c | ||
env.c | ||
error.c | ||
fortran.c | ||
hashtab.h | ||
icv-device.c | ||
icv.c | ||
iter_ull.c | ||
iter.c | ||
libgomp_f.h.in | ||
libgomp_g.h | ||
libgomp-plugin.c | ||
libgomp-plugin.h | ||
libgomp.h | ||
libgomp.map | ||
libgomp.spec.in | ||
libgomp.texi | ||
lock.c | ||
loop_ull.c | ||
loop.c | ||
Makefile.am | ||
Makefile.in | ||
oacc-async.c | ||
oacc-cuda.c | ||
oacc-host.c | ||
oacc-init.c | ||
oacc-int.h | ||
oacc-mem.c | ||
oacc-parallel.c | ||
oacc-plugin.c | ||
oacc-plugin.h | ||
oacc-profiling.c | ||
oacc-target.c | ||
omp_lib.f90.in | ||
omp_lib.h.in | ||
omp.h.in | ||
openacc_lib.h | ||
openacc.f90 | ||
openacc.h | ||
ordered.c | ||
parallel.c | ||
priority_queue.c | ||
priority_queue.h | ||
sections.c | ||
secure_getenv.h | ||
single.c | ||
splay-tree.c | ||
splay-tree.h | ||
target.c | ||
task.c | ||
taskloop.c | ||
team.c | ||
teams.c | ||
work.c |