PR84878 fix adds an assertion which can fail, e.g. when stack pointer
is adjusted inside the loop. We have to prevent it and search earlier
for any 'strange' instruction. The solution is to skip the whole loop
if using 'note_stores' we found that one of hard registers is in
'df->regular_block_artificial_uses' set.
Also patch properly prohibit not single-set instruction in loop body.
gcc/ChangeLog:
PR rtl-optimization/100225
PR rtl-optimization/84878
* modulo-sched.c (sms_schedule): Use note_stores to skip loops
where we have an instruction which touches (writes) any hard
register from df->regular_block_artificial_uses set.
Allow not-single-set instruction only right before basic block
tail.
gcc/testsuite/ChangeLog:
PR rtl-optimization/100225
PR rtl-optimization/84878
* gcc.dg/pr100225.c: New test.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c: New test.
Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
int s = 1;
#pragma omp target simd
for (int i = N - 1; i > -1; i -= s)
a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.
After omp-expand, this looks like:
...
<bb 5> :
simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
.omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
D.3193 = -s;
s.9 = s;
D.3204 = .GOMP_SIMT_LANE ();
D.3205 = -s.9;
D.3206 = (int) D.3204;
D.3207 = D.3205 * D.3206;
i = D.3207 + 31;
D.3209 = 0;
D.3210 = -s.9;
D.3211 = D.3210 - i;
D.3210 = -s.9;
D.3212 = D.3211 / D.3210;
D.3213 = (unsigned int) D.3212;
D.3213 = i >= 0 ? D.3213 : 0;
<bb 19> :
if (D.3209 < D.3213)
goto <bb 6>; [87.50%]
else
goto <bb 7>; [12.50%]
<bb 6> :
a[i] = 1;
D.3215 = -s.9;
D.3219 = .GOMP_SIMT_VF ();
D.3216 = (int) D.3219;
D.3220 = D.3215 * D.3216;
i = D.3220 + i;
D.3209 = D.3209 + 1;
goto <bb 19>; [100.00%]
...
On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.
So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
to true
- bb6 is once more executed, which should not happen because all the elements
that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
and with CANARY_SIZE unmodified, we run into:
...
Expected 0, got 1 at base[-961]
Aborted (core dumped)
...
The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
IV (D.3209 in our example) is generated in expand_omp_simd, and the
loop condition is tested in terms of the alternative IV rather than
the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is modified
by the SIMT code, but afterwards there are uses of both step and
fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
which should use step instead.
Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.
Build on x86_64-linux with nvptx accelerator, tested libgomp.
This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.
gcc/ChangeLog:
2020-10-02 Tom de Vries <tdevries@suse.de>
* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
fd->loop.step by either step or orig_step.
libgomp/ChangeLog:
2020-10-02 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/pr81778.c: New test.
When running the test-case included in this patch using an
nvptx accelerator, it fails in execution.
The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
during pass_jump as "trivially dead insns".
This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
...
class expand_operand ops[3];
create_output_operand (&ops[0], target, mode);
...
expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
...
which doesn't guarantee that target is assigned to by the expanded insn.
F.i., if target is:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...
then after expand_insn, we have:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...
See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some
calls [PR94941]" for a similar problem.
Fix this in the same way, by adding:
...
if (!rtx_equal_p (target, ops[0].value))
emit_move_insn (target, ops[0].value);
...
where applicable in the expand_GOMP_SIMT_* functions.
Tested libgomp on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2021-04-28 Tom de Vries <tdevries@suse.de>
PR target/100232
* internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
(expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
(expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
(expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.
If configured with --enable-offload-defaulted, configured but not installed
offload compilers and libgomp plugins are silently ignored. Useful for
distribution compilers where those are in separate optional packages.
2021-04-28 Jakub Jelinek <jakub@redhat.com>
Tobias Burnus <tobias@codesourcery.com>
ChangeLog:
* configure.ac (--enable-offload-defaulted): New.
* configure: Regenerate.
gcc/ChangeLog:
* configure.ac (OFFLOAD_DEFAULTED): AC_DEFINE if offload-defaulted.
* gcc.c (process_command): New variable.
(driver::maybe_putenv_OFFLOAD_TARGETS): If OFFLOAD_DEFAULTED,
set it if -foffload is defaulted.
* lto-wrapper.c (OFFLOAD_TARGET_DEFAULT_ENV): Define.
(compile_offload_image): If OFFLOAD_DEFAULTED and
OFFLOAD_TARGET_DEFAULT is in the environment, don't fail
if corresponding mkoffload can't be found.
(compile_images_for_offload_targets): Likewise. Free and clear
offload_names if no valid offload is found.
* config.in: Regenerate.
* configure: Regenerate.
libgomp/ChangeLog:
* configure.ac (OFFLOAD_DEFAULTED): AC_DEFINE if offload-defaulted.
* target.c (gomp_load_plugin_for_device): If set and if a plugin
can't be dlopened, silently assume it has no devices.
* Makefile.in: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.
It turned out that a compiler built without offloading support
and one with can produce slightly different diagnostic.
Offloading support implies ENABLE_OFFLOAD which implies that
g->have_offload is set when offloading is actually needed.
In cgraphunit.c, the latter causes flag_generate_offload = 1,
which in turn affects tree.c's free_lang_data.
The result is that the front-end specific diagnostic gets reset
('tree_diagnostics_defaults (global_dc)'), which affects in this
case 'Warning' vs. 'warning' via the Fortran frontend.
Result: 'Warning:' vs. 'warning:'.
Side note: Other FE also override the diagnostic, leading to
similar differences, e.g. the C++ FE outputs mangled function
names differently, cf. patch thread.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/par-reduction-2-1.f:
Use [Ww]arning in dg-bogus as FE diagnostic and default
diagnostic differ and the result depends on ENABLE_OFFLOAD.
* testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/classify-serial.f95:
Use [Ww]arning in dg-bogus as FE diagnostic and default
diagnostic differ and the result depends on ENABLE_OFFLOAD.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
libatomic isn't built for amdgcn but reduction-16.c adds it
via -foffload=-latomic when offloading for nvptx is enabled.
The following avoids linker errors when offloading to amdgcn is enabled
as well.
2021-04-21 Richard Biener <rguenther@suse.de>
libgomp/
* testsuite/libgomp.c-c++-common/reduction-16.c: Use -latomic
only on nvptx-none.
For the tests modified below, the effective target line has to be effective
when compiling for an offload target, except that variable-not-offloaded.c
would compile with unified-share memory and pr86416-*.c if long double/float128
is supported.
The previous check used a run-time device ability check. This new variant
now enables those dg- lines when _compiling_ for nvptx or gcn.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
New, based on check_effective_target_offload_target_nvptx.
(check_effective_target_offload_target_nvptx): Call it.
(check_effective_target_offload_target_amdgcn): New.
* testsuite/libgomp.c-c++-common/function-not-offloaded.c:
Require target offload_target_nvptx || offload_target_amdgcn.
* testsuite/libgomp.c-c++-common/variable-not-offloaded.c: Likewise.
* testsuite/libgomp.c/pr86416-1.c: Likewise.
* testsuite/libgomp.c/pr86416-2.c: Likewise.
As can be seen under valgrind, the testcase didn't bind in the last part
the fortran pointers properly to the c pointers.
2021-04-14 Jakub Jelinek <jakub@redhat.com>
PR testsuite/100071
* testsuite/libgomp.fortran/alloc-1.F90: Call c_f_pointer after last
cp = omp_alloc with cp, p arguments instead of cq, q and call
c_f_pointer after last cq = omp_alloc with cq, q.
We have seen an ICE both on trunk and devel/omp/gcc-10 branches which can
be reprodued with this simple testcase. It occurs if an OpenACC loop has
a collapse clause and any of the loop being collapsed uses GT or GE
condition. This issue is specific to OpenACC.
int main (void)
{
int ix, iy;
int dim_x = 16, dim_y = 16;
{
for (iy = dim_y - 1; iy > 0; --iy)
for (ix = dim_x - 1; ix > 0; --ix)
;
}
}
The problem is caused by a failing assertion in expand_oacc_collapse_init.
It checks that cond_code for fd->loop should be same as cond_code for all
the loops that are being collapsed. As the cond_code for fd->loop is
LT_EXPR with collapse clause (set at the end of omp_extract_for_data),
this assertion forces that all the loop in collapse clause should use
< operator.
There does not seem to be anything in the code which demands this
condition as loop with > condition works ok otherwise. I digged old
mailing list a bit but could not find any discussion on this change.
Looking at the code, expand_oacc_for checks that fd->loop->cond_code is
either LT_EXPR or GT_EXPR. I guess the original intention was to have
similar checks on the loop which are being collapsed. But the way check
was written does not acheive that.
I have fixed it by modifying the check in the assertion to be same as
check on fd->loop->cond_code.
I tested goacc and libgomp (with nvptx offloading) and did not see any
regression. I have added new tests to check collapse with GT/GE condition.
PR middle-end/98088
gcc/
* omp-expand.c (expand_oacc_collapse_init): Update condition in
a gcc_assert.
gcc/testsuite/
* c-c++-common/goacc/collapse-2.c: New.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Add check
for loop with GT/GE condition.
* testsuite/libgomp.oacc-c-c++-common/collapse-3.c: Likewise.
pthread_setspecific second argument is const void *, so that one can
call it even with pointers to const, but the function only stores the
pointer and does nothing else, so the new assumption of -Wmaybe-uninitialized
that functions taking such pointers will read from what those pointers
will point to is wrong. Maybe it would be useful to have some whitelist
of functions that surely don't do that.
Anyway, in this case it is easy to workaround the warning by moving the
pthread_setspecific call after the initialization without slowing anything
down.
2021-04-09 Jakub Jelinek <jakub@redhat.com>
PR libgomp/99984
* team.c (gomp_thread_start): Call pthread_setspecific for
!(defined HAVE_TLS || defined USE_EMUTLS) only after local_thr
has been initialized to avoid false positive warning.
For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.
libgomp/
* plugin/plugin-gcn.c (init_environment_variables): Don't prepend
the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
* config.h.in: Regenerate.
* configure: Likewise.
Fixup for recent commit d28f3da11d8c0aed9b746689d723022a9b5ec04c "openacc: Fix
lowering for derived-type mappings through array elements". With nvptx
offloading we see the usual:
[...]/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: In function 'MAIN__._omp_fn.0':
[...]/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:90:40: warning: using vector_length (32), ignoring 1
libgomp/
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
OpenACC 'serial' construct diagnostic for nvptx offloading.
For variables with 'declare target' attribute,
varpool_node::get_create marks variables as offload; however,
if the node already exists, it is not updated. C/C++ may tag
decl with 'declare target implicit', which may only be after
varpool creation turned into 'declare target' or 'declare target link';
in this case, the tagging has to happen in the FE.
gcc/c/ChangeLog:
PR c++/99509
* c-decl.c (finish_decl): For 'omp declare target implicit' vars,
ensure that the varpool node is marked as offloadable.
gcc/cp/ChangeLog:
PR c++/99509
* decl.c (cp_finish_decl): For 'omp declare target implicit' vars,
ensure that the varpool node is marked as offloadable.
libgomp/ChangeLog:
PR c++/99509
* testsuite/libgomp.c-c++-common/declare_target-1.c: New test.
Some gcc configurations default to -m32 but support -m64 too. This patch
just makes the ILP32 tests more reliable by following what e.g. libsanitizer
configury does.
2021-03-04 Jakub Jelinek <jakub@redhat.com>
* configure.ac: Add AC_CHECK_SIZEOF([void *]).
* plugin/configfrag.ac: Check $ac_cv_sizeof_void_p value instead of
checking of -m32 or -mx32 options on the command line.
* config.h.in: Regenerated.
* configure: Regenerated.
This fails everywhere on Darwin, which does not have support for
symbol aliases. Add a dg-require-alias to UNSUPPORT it.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/pr96390.c: Require alias
support from the target.
This adds support for the task detach clause to taskwait and taskgroup, and
simplifies the handling of the detach clause by moving most of the extra
handling required for detach tasks to omp_fulfill_event.
2021-02-25 Kwok Cheung Yeung <kcy@codesourcery.com>
Jakub Jelinek <jakub@redhat.com>
libgomp/
PR libgomp/98738
* libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED.
(struct gomp_task): Replace detach and completion_sem fields with
union containing completion_sem and detach_team. Add deferred_p
field.
(struct gomp_team): Remove task_detach_queue.
* task.c: Include assert.h.
(gomp_init_task): Initialize deferred_p and completion_sem fields.
Rearrange initialization order of fields.
(task_fulfilled_p): Delete.
(GOMP_task): Use address of task as the event handle. Remove
initialization of detach field. Initialize deferred_p field.
Use automatic local for completion_sem. Initialize detach_team field
for deferred tasks.
(gomp_barrier_handle_tasks): Remove handling of task_detach_queue.
Set kind of suspended detach task to GOMP_TASK_DETACHED and
decrement task_running_count. Move finish_cancelled block out of
else branch. Relocate call to gomp_team_barrier_done.
(GOMP_taskwait): Handle tasks with completion events that have not
been fulfilled.
(GOMP_taskgroup_end): Likewise.
(omp_fulfill_event): Use address of task as event handle. Post to
completion_sem for undeferred tasks. Clear detach_team if task
has not finished. For finished tasks, handle post-execution tasks,
call gomp_team_barrier_wake if necessary, and free task.
* team.c (gomp_new_team): Remove initialization of task_detach_queue.
(free_team): Remove free of task_detach_queue.
* testsuite/libgomp.c-c++-common/task-detach-1.c: Fix formatting.
* testsuite/libgomp.c-c++-common/task-detach-2.c: Fix formatting.
* testsuite/libgomp.c-c++-common/task-detach-3.c: Fix formatting.
* testsuite/libgomp.c-c++-common/task-detach-4.c: Fix formatting.
* testsuite/libgomp.c-c++-common/task-detach-5.c: Fix formatting.
Change data-sharing of detach events on enclosing parallel to private.
* testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise. Remove
taskwait directive.
* testsuite/libgomp.c-c++-common/task-detach-7.c: New.
* testsuite/libgomp.c-c++-common/task-detach-8.c: New.
* testsuite/libgomp.c-c++-common/task-detach-9.c: New.
* testsuite/libgomp.c-c++-common/task-detach-10.c: New.
* testsuite/libgomp.c-c++-common/task-detach-11.c: New.
* testsuite/libgomp.fortran/task-detach-1.f90: Fix formatting.
* testsuite/libgomp.fortran/task-detach-2.f90: Fix formatting.
* testsuite/libgomp.fortran/task-detach-3.f90: Fix formatting.
* testsuite/libgomp.fortran/task-detach-4.f90: Fix formatting.
* testsuite/libgomp.fortran/task-detach-5.f90: Fix formatting.
Change data-sharing of detach events on enclosing parallel to private.
* testsuite/libgomp.fortran/task-detach-6.f90: Likewise. Remove
taskwait directive.
* testsuite/libgomp.fortran/task-detach-7.f90: New.
* testsuite/libgomp.fortran/task-detach-8.f90: New.
* testsuite/libgomp.fortran/task-detach-9.f90: New.
* testsuite/libgomp.fortran/task-detach-10.f90: New.
* testsuite/libgomp.fortran/task-detach-11.f90: New.
gcc/fortran/ChangeLog:
PR fortran/99171
* trans-openmp.c (gfc_omp_is_optional_argument): Regard optional
dummy procs as nonoptional as no special treatment is needed.
libgomp/ChangeLog:
PR fortran/99171
* testsuite/libgomp.fortran/dummy-procs-1.f90: New test.
This patch disallows selecting components of array sections in update
directives for OpenACC, as specified in OpenACC 3.0, "2.14.4. Update
Directive":
In Fortran, members of variables of derived type may appear, including
a subarray of a member. Members of subarrays of derived type may
not appear.
The diagnostic for attempting to use the same construct on other
directives has also been improved.
gcc/fortran/
* openmp.c (resolve_omp_clauses): Disallow selecting components
of arrays of derived type.
gcc/testsuite/
* gfortran.dg/goacc/array-with-dt-2.f90: Remove expected errors.
* gfortran.dg/goacc/array-with-dt-6.f90: New test.
* gfortran.dg/goacc/mapping-tests-2.f90: Update expected error.
* gfortran.dg/goacc/ref_inquiry.f90: Update expected errors.
* gfortran.dg/gomp/ref_inquiry.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-fortran/array-stride-dt-1.f90: Remove
expected errors.
This patch fixes lowering of derived-type mappings which select elements
of arrays of derived types, and similar. These would previously lead
to ICEs.
With this change, OpenACC directives can pass through constructs that
are no longer recognized by the gimplifier, hence alterations are needed
there also.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Handle element selection
for arrays of derived types.
gcc/
* gimplify.c (gimplify_scan_omp_clauses): Handle ATTACH_DETACH
for non-decls.
gcc/testsuite/
* gfortran.dg/goacc/array-with-dt-1.f90: New test.
* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-1.f90: Re-enable test.
* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
* gfortran.dg/goacc/derived-classtypes-1.f95: Uncomment
previously-broken directives.
libgomp/
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: New test.
* testsuite/libgomp.oacc-fortran/update-dt-array.f90: Likewise.