gcc/ChangeLog:
* omp-low.c (finish_taskreg_scan): Use the proper detach decl.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/task-detach-12.c: New test.
* testsuite/libgomp.fortran/task-detach-12.f90: New test.
When a taskloop doesn't have any iterations, GOMP_taskloop* takes an early
return, doesn't create any tasks and more importantly, doesn't create
a taskgroup and doesn't register task reductions. But, the code emitted
in the callers assumes task reductions have been registered and performs
the reduction handling and task reduction unregistration. The pointer
to the task reduction private variables is reused, on input it is the alignment
and only on output it is the pointer, so in the case taskloop with no iterations
the caller attempts to dereference the alignment value as if it was a pointer
and crashes. We could in the early returns register the task reductions
only to have them looped over and unregistered in the caller, but I think
it is better to tell the caller there is nothing to task reduce and bypass
all that.
2021-05-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/100471
* omp-low.c (lower_omp_task_reductions): For OMP_TASKLOOP, if data
is 0, bypass the reduction loop including
GOMP_taskgroup_reduction_unregister call.
* taskloop.c (GOMP_taskloop): If GOMP_TASK_FLAG_REDUCTION and not
GOMP_TASK_FLAG_NOGROUP, when doing early return clear the task
reduction pointer.
* testsuite/libgomp.c/task-reduction-4.c: New test.
2021-05-07 Tobias Burnus <tobias@codesourcery.com>
Tom de Vries <tdevries@suse.de>
gcc/ChangeLog:
* omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
a truth_value_p reduction variable is nonintegral.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
complex/floating-point || + && reduction with 'omp target'.
* testsuite/libgomp.c-c++-common/reduction-6.c: Likewise.
C/C++ permit logical AND and logical OR also with floating-point or complex
arguments by doing an unequal zero comparison; the result is an 'int' with
value one or zero. Hence, those are also permitted as reduction variable,
even though it is not the most sensible thing to do.
gcc/c/ChangeLog:
* c-typeck.c (c_finish_omp_clauses): Accept float + complex
for || and && reductions.
gcc/cp/ChangeLog:
* semantics.c (finish_omp_reduction_clause): Accept float + complex
for || and && reductions.
gcc/ChangeLog:
* omp-low.c (lower_rec_input_clauses, lower_reduction_clauses): Handle
&& and || with floating-point and complex arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/clause-1.c: Use 'reduction(&:..)' instead of '...(&&:..)'.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/reduction-1.c: New test.
* testsuite/libgomp.c-c++-common/reduction-2.c: New test.
* testsuite/libgomp.c-c++-common/reduction-3.c: New test.
The test-case included in this patch contains this target region:
...
for (int i0 = 0 ; i0 < N0 ; i0++ )
counter_N0.i += 1;
...
When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32. The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.
This is caused by the implementation of SIMT being incomplete. It handles
regular reductions, but appearantly not user-defined reductions.
For now, handle this by disabling SIMT in this case, specifically by setting
sctx->max_vf to 1.
Tested libgomp on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2021-05-03 Tom de Vries <tdevries@suse.de>
PR target/100321
* omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined
reduction.
libgomp/ChangeLog:
2021-05-03 Tom de Vries <tdevries@suse.de>
PR target/100321
* testsuite/libgomp.c/target-44.c: New test.
PR84878 fix adds an assertion which can fail, e.g. when stack pointer
is adjusted inside the loop. We have to prevent it and search earlier
for any 'strange' instruction. The solution is to skip the whole loop
if using 'note_stores' we found that one of hard registers is in
'df->regular_block_artificial_uses' set.
Also patch properly prohibit not single-set instruction in loop body.
gcc/ChangeLog:
PR rtl-optimization/100225
PR rtl-optimization/84878
* modulo-sched.c (sms_schedule): Use note_stores to skip loops
where we have an instruction which touches (writes) any hard
register from df->regular_block_artificial_uses set.
Allow not-single-set instruction only right before basic block
tail.
gcc/testsuite/ChangeLog:
PR rtl-optimization/100225
PR rtl-optimization/84878
* gcc.dg/pr100225.c: New test.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c: New test.
Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
int s = 1;
#pragma omp target simd
for (int i = N - 1; i > -1; i -= s)
a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.
After omp-expand, this looks like:
...
<bb 5> :
simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
.omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
D.3193 = -s;
s.9 = s;
D.3204 = .GOMP_SIMT_LANE ();
D.3205 = -s.9;
D.3206 = (int) D.3204;
D.3207 = D.3205 * D.3206;
i = D.3207 + 31;
D.3209 = 0;
D.3210 = -s.9;
D.3211 = D.3210 - i;
D.3210 = -s.9;
D.3212 = D.3211 / D.3210;
D.3213 = (unsigned int) D.3212;
D.3213 = i >= 0 ? D.3213 : 0;
<bb 19> :
if (D.3209 < D.3213)
goto <bb 6>; [87.50%]
else
goto <bb 7>; [12.50%]
<bb 6> :
a[i] = 1;
D.3215 = -s.9;
D.3219 = .GOMP_SIMT_VF ();
D.3216 = (int) D.3219;
D.3220 = D.3215 * D.3216;
i = D.3220 + i;
D.3209 = D.3209 + 1;
goto <bb 19>; [100.00%]
...
On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.
So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
to true
- bb6 is once more executed, which should not happen because all the elements
that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
and with CANARY_SIZE unmodified, we run into:
...
Expected 0, got 1 at base[-961]
Aborted (core dumped)
...
The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
IV (D.3209 in our example) is generated in expand_omp_simd, and the
loop condition is tested in terms of the alternative IV rather than
the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is modified
by the SIMT code, but afterwards there are uses of both step and
fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
which should use step instead.
Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.
Build on x86_64-linux with nvptx accelerator, tested libgomp.
This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.
gcc/ChangeLog:
2020-10-02 Tom de Vries <tdevries@suse.de>
* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
fd->loop.step by either step or orig_step.
libgomp/ChangeLog:
2020-10-02 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/pr81778.c: New test.
When running the test-case included in this patch using an
nvptx accelerator, it fails in execution.
The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
during pass_jump as "trivially dead insns".
This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
...
class expand_operand ops[3];
create_output_operand (&ops[0], target, mode);
...
expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
...
which doesn't guarantee that target is assigned to by the expanded insn.
F.i., if target is:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...
then after expand_insn, we have:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...
See commit 3af3bec2e4 "internal-fn: Avoid dropping the lhs of some
calls [PR94941]" for a similar problem.
Fix this in the same way, by adding:
...
if (!rtx_equal_p (target, ops[0].value))
emit_move_insn (target, ops[0].value);
...
where applicable in the expand_GOMP_SIMT_* functions.
Tested libgomp on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2021-04-28 Tom de Vries <tdevries@suse.de>
PR target/100232
* internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
(expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
(expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
(expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.
If configured with --enable-offload-defaulted, configured but not installed
offload compilers and libgomp plugins are silently ignored. Useful for
distribution compilers where those are in separate optional packages.
2021-04-28 Jakub Jelinek <jakub@redhat.com>
Tobias Burnus <tobias@codesourcery.com>
ChangeLog:
* configure.ac (--enable-offload-defaulted): New.
* configure: Regenerate.
gcc/ChangeLog:
* configure.ac (OFFLOAD_DEFAULTED): AC_DEFINE if offload-defaulted.
* gcc.c (process_command): New variable.
(driver::maybe_putenv_OFFLOAD_TARGETS): If OFFLOAD_DEFAULTED,
set it if -foffload is defaulted.
* lto-wrapper.c (OFFLOAD_TARGET_DEFAULT_ENV): Define.
(compile_offload_image): If OFFLOAD_DEFAULTED and
OFFLOAD_TARGET_DEFAULT is in the environment, don't fail
if corresponding mkoffload can't be found.
(compile_images_for_offload_targets): Likewise. Free and clear
offload_names if no valid offload is found.
* config.in: Regenerate.
* configure: Regenerate.
libgomp/ChangeLog:
* configure.ac (OFFLOAD_DEFAULTED): AC_DEFINE if offload-defaulted.
* target.c (gomp_load_plugin_for_device): If set and if a plugin
can't be dlopened, silently assume it has no devices.
* Makefile.in: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.
It turned out that a compiler built without offloading support
and one with can produce slightly different diagnostic.
Offloading support implies ENABLE_OFFLOAD which implies that
g->have_offload is set when offloading is actually needed.
In cgraphunit.c, the latter causes flag_generate_offload = 1,
which in turn affects tree.c's free_lang_data.
The result is that the front-end specific diagnostic gets reset
('tree_diagnostics_defaults (global_dc)'), which affects in this
case 'Warning' vs. 'warning' via the Fortran frontend.
Result: 'Warning:' vs. 'warning:'.
Side note: Other FE also override the diagnostic, leading to
similar differences, e.g. the C++ FE outputs mangled function
names differently, cf. patch thread.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/par-reduction-2-1.f:
Use [Ww]arning in dg-bogus as FE diagnostic and default
diagnostic differ and the result depends on ENABLE_OFFLOAD.
* testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/classify-serial.f95:
Use [Ww]arning in dg-bogus as FE diagnostic and default
diagnostic differ and the result depends on ENABLE_OFFLOAD.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
libatomic isn't built for amdgcn but reduction-16.c adds it
via -foffload=-latomic when offloading for nvptx is enabled.
The following avoids linker errors when offloading to amdgcn is enabled
as well.
2021-04-21 Richard Biener <rguenther@suse.de>
libgomp/
* testsuite/libgomp.c-c++-common/reduction-16.c: Use -latomic
only on nvptx-none.
For the tests modified below, the effective target line has to be effective
when compiling for an offload target, except that variable-not-offloaded.c
would compile with unified-share memory and pr86416-*.c if long double/float128
is supported.
The previous check used a run-time device ability check. This new variant
now enables those dg- lines when _compiling_ for nvptx or gcn.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
New, based on check_effective_target_offload_target_nvptx.
(check_effective_target_offload_target_nvptx): Call it.
(check_effective_target_offload_target_amdgcn): New.
* testsuite/libgomp.c-c++-common/function-not-offloaded.c:
Require target offload_target_nvptx || offload_target_amdgcn.
* testsuite/libgomp.c-c++-common/variable-not-offloaded.c: Likewise.
* testsuite/libgomp.c/pr86416-1.c: Likewise.
* testsuite/libgomp.c/pr86416-2.c: Likewise.
As can be seen under valgrind, the testcase didn't bind in the last part
the fortran pointers properly to the c pointers.
2021-04-14 Jakub Jelinek <jakub@redhat.com>
PR testsuite/100071
* testsuite/libgomp.fortran/alloc-1.F90: Call c_f_pointer after last
cp = omp_alloc with cp, p arguments instead of cq, q and call
c_f_pointer after last cq = omp_alloc with cq, q.
We have seen an ICE both on trunk and devel/omp/gcc-10 branches which can
be reprodued with this simple testcase. It occurs if an OpenACC loop has
a collapse clause and any of the loop being collapsed uses GT or GE
condition. This issue is specific to OpenACC.
int main (void)
{
int ix, iy;
int dim_x = 16, dim_y = 16;
{
for (iy = dim_y - 1; iy > 0; --iy)
for (ix = dim_x - 1; ix > 0; --ix)
;
}
}
The problem is caused by a failing assertion in expand_oacc_collapse_init.
It checks that cond_code for fd->loop should be same as cond_code for all
the loops that are being collapsed. As the cond_code for fd->loop is
LT_EXPR with collapse clause (set at the end of omp_extract_for_data),
this assertion forces that all the loop in collapse clause should use
< operator.
There does not seem to be anything in the code which demands this
condition as loop with > condition works ok otherwise. I digged old
mailing list a bit but could not find any discussion on this change.
Looking at the code, expand_oacc_for checks that fd->loop->cond_code is
either LT_EXPR or GT_EXPR. I guess the original intention was to have
similar checks on the loop which are being collapsed. But the way check
was written does not acheive that.
I have fixed it by modifying the check in the assertion to be same as
check on fd->loop->cond_code.
I tested goacc and libgomp (with nvptx offloading) and did not see any
regression. I have added new tests to check collapse with GT/GE condition.
PR middle-end/98088
gcc/
* omp-expand.c (expand_oacc_collapse_init): Update condition in
a gcc_assert.
gcc/testsuite/
* c-c++-common/goacc/collapse-2.c: New.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Add check
for loop with GT/GE condition.
* testsuite/libgomp.oacc-c-c++-common/collapse-3.c: Likewise.
pthread_setspecific second argument is const void *, so that one can
call it even with pointers to const, but the function only stores the
pointer and does nothing else, so the new assumption of -Wmaybe-uninitialized
that functions taking such pointers will read from what those pointers
will point to is wrong. Maybe it would be useful to have some whitelist
of functions that surely don't do that.
Anyway, in this case it is easy to workaround the warning by moving the
pthread_setspecific call after the initialization without slowing anything
down.
2021-04-09 Jakub Jelinek <jakub@redhat.com>
PR libgomp/99984
* team.c (gomp_thread_start): Call pthread_setspecific for
!(defined HAVE_TLS || defined USE_EMUTLS) only after local_thr
has been initialized to avoid false positive warning.
For unknown reasons, this had gotten added for the libgomp HSA plugin in commit
b8d89b03db (r242749) "Remove build dependence on
HSA run-time", and later propagated into the GCN plugin.
libgomp/
* plugin/plugin-gcn.c (init_environment_variables): Don't prepend
the 'HSA_RUNTIME_LIB' path to 'libhsa-runtime64.so'.
* plugin/configfrag.ac (HSA_RUNTIME_LIB): Clean up.
* config.h.in: Regenerate.
* configure: Likewise.
Fixup for recent commit d28f3da11d "openacc: Fix
lowering for derived-type mappings through array elements". With nvptx
offloading we see the usual:
[...]/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: In function 'MAIN__._omp_fn.0':
[...]/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:90:40: warning: using vector_length (32), ignoring 1
libgomp/
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
OpenACC 'serial' construct diagnostic for nvptx offloading.
For variables with 'declare target' attribute,
varpool_node::get_create marks variables as offload; however,
if the node already exists, it is not updated. C/C++ may tag
decl with 'declare target implicit', which may only be after
varpool creation turned into 'declare target' or 'declare target link';
in this case, the tagging has to happen in the FE.
gcc/c/ChangeLog:
PR c++/99509
* c-decl.c (finish_decl): For 'omp declare target implicit' vars,
ensure that the varpool node is marked as offloadable.
gcc/cp/ChangeLog:
PR c++/99509
* decl.c (cp_finish_decl): For 'omp declare target implicit' vars,
ensure that the varpool node is marked as offloadable.
libgomp/ChangeLog:
PR c++/99509
* testsuite/libgomp.c-c++-common/declare_target-1.c: New test.
Some gcc configurations default to -m32 but support -m64 too. This patch
just makes the ILP32 tests more reliable by following what e.g. libsanitizer
configury does.
2021-03-04 Jakub Jelinek <jakub@redhat.com>
* configure.ac: Add AC_CHECK_SIZEOF([void *]).
* plugin/configfrag.ac: Check $ac_cv_sizeof_void_p value instead of
checking of -m32 or -mx32 options on the command line.
* config.h.in: Regenerated.
* configure: Regenerated.
This fails everywhere on Darwin, which does not have support for
symbol aliases. Add a dg-require-alias to UNSUPPORT it.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/pr96390.c: Require alias
support from the target.
This adds support for the task detach clause to taskwait and taskgroup, and
simplifies the handling of the detach clause by moving most of the extra
handling required for detach tasks to omp_fulfill_event.
2021-02-25 Kwok Cheung Yeung <kcy@codesourcery.com>
Jakub Jelinek <jakub@redhat.com>
libgomp/
PR libgomp/98738
* libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED.
(struct gomp_task): Replace detach and completion_sem fields with
union containing completion_sem and detach_team. Add deferred_p
field.
(struct gomp_team): Remove task_detach_queue.
* task.c: Include assert.h.
(gomp_init_task): Initialize deferred_p and completion_sem fields.
Rearrange initialization order of fields.
(task_fulfilled_p): Delete.
(GOMP_task): Use address of task as the event handle. Remove
initialization of detach field. Initialize deferred_p field.
Use automatic local for completion_sem. Initialize detach_team field
for deferred tasks.
(gomp_barrier_handle_tasks): Remove handling of task_detach_queue.
Set kind of suspended detach task to GOMP_TASK_DETACHED and
decrement task_running_count. Move finish_cancelled block out of
else branch. Relocate call to gomp_team_barrier_done.
(GOMP_taskwait): Handle tasks with completion events that have not
been fulfilled.
(GOMP_taskgroup_end): Likewise.
(omp_fulfill_event): Use address of task as event handle. Post to
completion_sem for undeferred tasks. Clear detach_team if task
has not finished. For finished tasks, handle post-execution tasks,
call gomp_team_barrier_wake if necessary, and free task.
* team.c (gomp_new_team): Remove initialization of task_detach_queue.
(free_team): Remove free of task_detach_queue.
* testsuite/libgomp.c-c++-common/task-detach-1.c: Fix formatting.
* testsuite/libgomp.c-c++-common/task-detach-2.c: Fix formatting.
* testsuite/libgomp.c-c++-common/task-detach-3.c: Fix formatting.
* testsuite/libgomp.c-c++-common/task-detach-4.c: Fix formatting.
* testsuite/libgomp.c-c++-common/task-detach-5.c: Fix formatting.
Change data-sharing of detach events on enclosing parallel to private.
* testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise. Remove
taskwait directive.
* testsuite/libgomp.c-c++-common/task-detach-7.c: New.
* testsuite/libgomp.c-c++-common/task-detach-8.c: New.
* testsuite/libgomp.c-c++-common/task-detach-9.c: New.
* testsuite/libgomp.c-c++-common/task-detach-10.c: New.
* testsuite/libgomp.c-c++-common/task-detach-11.c: New.
* testsuite/libgomp.fortran/task-detach-1.f90: Fix formatting.
* testsuite/libgomp.fortran/task-detach-2.f90: Fix formatting.
* testsuite/libgomp.fortran/task-detach-3.f90: Fix formatting.
* testsuite/libgomp.fortran/task-detach-4.f90: Fix formatting.
* testsuite/libgomp.fortran/task-detach-5.f90: Fix formatting.
Change data-sharing of detach events on enclosing parallel to private.
* testsuite/libgomp.fortran/task-detach-6.f90: Likewise. Remove
taskwait directive.
* testsuite/libgomp.fortran/task-detach-7.f90: New.
* testsuite/libgomp.fortran/task-detach-8.f90: New.
* testsuite/libgomp.fortran/task-detach-9.f90: New.
* testsuite/libgomp.fortran/task-detach-10.f90: New.
* testsuite/libgomp.fortran/task-detach-11.f90: New.
gcc/fortran/ChangeLog:
PR fortran/99171
* trans-openmp.c (gfc_omp_is_optional_argument): Regard optional
dummy procs as nonoptional as no special treatment is needed.
libgomp/ChangeLog:
PR fortran/99171
* testsuite/libgomp.fortran/dummy-procs-1.f90: New test.
This patch disallows selecting components of array sections in update
directives for OpenACC, as specified in OpenACC 3.0, "2.14.4. Update
Directive":
In Fortran, members of variables of derived type may appear, including
a subarray of a member. Members of subarrays of derived type may
not appear.
The diagnostic for attempting to use the same construct on other
directives has also been improved.
gcc/fortran/
* openmp.c (resolve_omp_clauses): Disallow selecting components
of arrays of derived type.
gcc/testsuite/
* gfortran.dg/goacc/array-with-dt-2.f90: Remove expected errors.
* gfortran.dg/goacc/array-with-dt-6.f90: New test.
* gfortran.dg/goacc/mapping-tests-2.f90: Update expected error.
* gfortran.dg/goacc/ref_inquiry.f90: Update expected errors.
* gfortran.dg/gomp/ref_inquiry.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-fortran/array-stride-dt-1.f90: Remove
expected errors.
This patch fixes lowering of derived-type mappings which select elements
of arrays of derived types, and similar. These would previously lead
to ICEs.
With this change, OpenACC directives can pass through constructs that
are no longer recognized by the gimplifier, hence alterations are needed
there also.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Handle element selection
for arrays of derived types.
gcc/
* gimplify.c (gimplify_scan_omp_clauses): Handle ATTACH_DETACH
for non-decls.
gcc/testsuite/
* gfortran.dg/goacc/array-with-dt-1.f90: New test.
* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-1.f90: Re-enable test.
* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
* gfortran.dg/goacc/derived-classtypes-1.f95: Uncomment
previously-broken directives.
libgomp/
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: New test.
* testsuite/libgomp.oacc-fortran/update-dt-array.f90: Likewise.
Linux man-pages 5.07 wrongly declares syscall output type as int. This error
was fixed in release 5.10, so this patch reverts my recent change.
2021-02-11 Uroš Bizjak <ubizjak@gmail.com>
libgomp/
* config/linux/x86/futex.h (__futex_wait):
Revert output type back to long.
(__futex_wake): Ditto.
(futex_wait): Update for revert.
(futex_wake): Ditto.
Move syscall asms to static inline wrapper functions to improve #ifdeffery.
Also correct output type to int and timeout type to void *.
2021-02-11 Uroš Bizjak <ubizjak@gmail.com>
libgomp/
* config/linux/x86/futex.h (__futex_wait): New static inline
wrapper function. Correct output type to int and
timeout type to void *.
(__futex_wake): New static inline wrapper function.
Correct output type to int.
(futex_wait): Use __futex_wait.
(futex_wake): Use __futex_wake.
This patch adds some XFAILs for PR98979 until the patch to fix them has
been approved. See:
https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564711.html
gcc/testsuite/
PR fortran/98979
* gfortran.dg/goacc/array-with-dt-2.f90: Add expected errors.
* gfortran.dg/goacc/derived-chartypes-1.f90: Skip ICEing test.
* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
libgomp/
PR fortran/98979
* testsuite/libgomp.oacc-fortran/array-stride-dt-1.f90: Add expected
errors.
OpenACC 3.0 ("2.14.4. Update Directive") states:
Noncontiguous subarrays may appear. It is implementation-specific
whether noncontiguous regions are updated by using one transfer for
each contiguous subregion, or whether the non-contiguous data is
packed, transferred once, and unpacked, or whether one or more larger
subarrays (no larger than the smallest contiguous region that contains
the specified subarray) are updated.
This patch relaxes some conditions in the Fortran front-end so that
strided accesses are permitted for update directives.
gcc/fortran/
* openmp.c (resolve_omp_clauses): Omit OpenACC update in
contiguity check and stride-specified error.
gcc/testsuite/
* gfortran.dg/goacc/array-with-dt-2.f90: New test.
libgomp/
* testsuite/libgomp.oacc-fortran/array-stride-dt-1.f90: New test.
On Wed, Jan 20, 2021 at 05:04:39PM +0100, Florian Weimer wrote:
> Sorry, this appears to cause OpenMP task state corruption in RPM. We
> have only seen this on s390x.
Haven't actually verified it, but my suspection is that this is a caller
stack corruption.
We play with fire with the GOMP_task API/ABI extensions, the GOMP_task
function used to be:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags);
and later:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags,
void **depend);
and later:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags,
void **depend, int priority);
and now:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags,
void **depend, int priority, void *detach)
and which of those depend, priority and detach argument is present depends
on the bits in flags.
I'm afraid the compiler just decided to spill the detach = NULL store in
if ((flags & GOMP_TASK_FLAG_DETACH) == 0)
detach = NULL;
on s390x into the argument stack slot. Not a problem if the caller passes
all those 10 arguments, but if not, can clobber random stack location.
This hack should fix it up. Priority doesn't need changing, but I've
changed it anyway just to be safe. With the patch none of the 3 arguments
are ever modified, so I'd hope gcc doesn't decide to spill something
unrelated there.
2021-01-20 Jakub Jelinek <jakub@redhat.com>
* task.c (GOMP_task): Rename priority argument to priority_arg,
add priority automatic variable and modify that variable. Instead of
clearing detach argument when GOMP_TASK_FLAG_DETACH bit is not set,
check flags for that bit.
This patch introduces gomp_sem_getcount wrapper, which uses sem_getvalue
for POSIX and atomic loads for linux futex and accel. rtems for now
remains broken.
2021-01-18 Jakub Jelinek <jakub@redhat.com>
* config/linux/sem.h (gomp_sem_getcount): New function.
* config/posix/sem.h (gomp_sem_getcount): New function.
* config/posix/sem.c (gomp_sem_getcount): New function.
* config/accel/sem.h (gomp_sem_getcount): New function.
* task.c (task_fulfilled_p): Use gomp_sem_getcount.
(omp_fulfill_event): Likewise.
The recent changes to error on mixing -march=i386 and -fcf-protection broke
bootstrap. This patch changes lib{atomic,gomp,itm} configury, so that it
only adds -march=i486 to flags if really needed (i.e. when 486 or later isn't
on by default already). Similarly, it will not use ifuncs if -mcx16
(or -march=i686 for 32-bit) is on by default.
2021-01-15 Jakub Jelinek <jakub@redhat.com>
PR target/70454
libatomic/
* configure.tgt: For i?86 and x86_64 determine if -march=i486 needs to
be added through preprocessor check on
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4. Determine if try_ifunc is needed
based on preprocessor check on __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
or __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8.
libgomp/
* configure.tgt: For i?86 and x86_64 determine if -march=i486 needs to
be added through preprocessor check on
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4.
libitm/
* configure.tgt: For i?86 and x86_64 determine if -march=i486 needs to
be added through preprocessor check on
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4.
As recently again discussed in <https://gcc.gnu.org/PR97436> "[nvptx] -m32
support", nvptx offloading other than for 64-bit host has never been
implemented, tested, supported. So we simply should buildn't the nvptx libgomp
plugin in this case.
This avoids build problems if, for example, in a (standard) bi-arch
x86_64-pc-linux-gnu '-m64'/'-m32' build, libcuda is available only in a 64-bit
variant but not in a 32-bit one, which, for example, is the case if you build
GCC against the CUDA toolkit's 'stubs/libcuda.so' (see
<https://stackoverflow.com/a/52784819>).
This amends PR65099 commit a92defdab7 (r225560)
"[nvptx offloading] Only 64-bit configurations are currently supported" to
match the way we're doing this for the HSA/GCN plugins.
libgomp/
PR libgomp/65099
* plugin/configfrag.ac (PLUGIN_NVPTX): Restrict to supported
configurations.
* configure: Regenerate.
* plugin/plugin-nvptx.c (nvptx_get_num_devices): Remove 64-bit
check.
The libgomp texinfo docs lead to an invalid "up" link on the Top node,
which we can avoid similarly to the Top link in the main GCC manual.
2020-12-28 Sandra Loosemore <sandra@codesourcery.com>
libgomp/
* libgomp.texi (Top): Avoid bad "up" link.
The attached testcase is miscompiled, because we optimize shared clauses
to firstprivate when task body can't modify the variable even when the
task has depend clause. That is wrong, because firstprivate means the
variable will be copied immediately when the task is created, while with
depend clause some other task might change it later before the dependencies
are satisfied and the task should observe the value only after the change.
2020-12-18 Jakub Jelinek <jakub@redhat.com>
* gimplify.c (struct gimplify_omp_ctx): Add has_depend member.
(gimplify_scan_omp_clauses): Set it to true if OMP_CLAUSE_DEPEND
appears on OMP_TASK.
(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Force
GOVD_WRITTEN on shared variables if task construct has depend clause.
* testsuite/libgomp.c/task-6.c: New test.
These are the same header files that exist in the Radeon Open Compute Runtime
project (as of October 2020), but they have been specially relicensed by AMD
for use in GCC.
The header files retain AMD copyright.
include/ChangeLog:
* hsa.h: Replace whole file.
* hsa_ext_amd.h: New file.
* hsa_ext_image.h: New file.
libgomp/ChangeLog:
* plugin/plugin-gcn.c: Include hsa_ext_amd.h.
(HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT): Delete redundant definition.
The change in major version (and the increment from Darwin19 to 20)
caused libtool tests to fail which resulted in incorrect build settings
for shared libraries.
We take this opportunity to sort out the shared undefined symbols state
rather than propagating the current unsound behaviour into a new rev.
This change means that we default to the case that missing symbols are
considered an error, and if one wants to allow this intentionally, the
confiuration for that case should be set appropriately.
Three existing cases need undefined dynamic lookup:
libitm, where there is already a configuration mechanism to add the
flags.
libcc1, where we add simple configuration to add the flags for Darwin.
libsanitizer, where we can add to the existing extra flags.
libcc1/ChangeLog:
PR target/97865
* Makefile.am: Add dynamic_lookup to LD flags for Darwin.
* configure.ac: Test for Darwin host and set a flag.
* Makefile.in: Regenerate.
* configure: Regenerate.
libitm/ChangeLog:
PR target/97865
* configure.tgt: Add dynamic_lookup to XLDFLAGS for Darwin.
* configure: Regenerate.
libsanitizer/ChangeLog:
PR target/97865
* configure.tgt: Add dynamic_lookup to EXTRA_CXXFLAGS for
Darwin.
* configure: Regenerate.
ChangeLog:
PR target/97865
* libtool.m4: Update handling of Darwin platform link flags
for Darwin20.
gcc/ChangeLog:
PR target/97865
* configure: Regenerate.
libatomic/ChangeLog:
PR target/97865
* configure: Regenerate.
libbacktrace/ChangeLog:
PR target/97865
* configure: Regenerate.
libffi/ChangeLog:
PR target/97865
* configure: Regenerate.
libgfortran/ChangeLog:
PR target/97865
* configure: Regenerate.
libgomp/ChangeLog:
PR target/97865
* configure: Regenerate.
libhsail-rt/ChangeLog:
PR target/97865
* configure: Regenerate.
libobjc/ChangeLog:
PR target/97865
* configure: Regenerate.
libphobos/ChangeLog:
PR target/97865
* configure: Regenerate.
libquadmath/ChangeLog:
PR target/97865
* configure: Regenerate.
libssp/ChangeLog:
PR target/97865
* configure: Regenerate.
libstdc++-v3/ChangeLog:
PR target/97865
* configure: Regenerate.
libvtv/ChangeLog:
PR target/97865
* configure: Regenerate.
zlib/ChangeLog:
PR target/97865
* configure: Regenerate.
The testcase had invalid assumptions about which loop iterations would run
first and last.
libgomp/ChangeLog
* testsuite/libgomp.oacc-fortran/atomic_capture-1.f90 (main): Adjust
expected results.
Ensure the code will continue to compile when elf.h gets these definitions.
libgomp/ChangeLog:
* plugin/plugin-gcn.c: Don't redefine relocations if elf.h has them.
(reserved): Delete unused define.
This removes the nest-var ICV, expressing nesting in terms of the
max-active-levels-var ICV instead. The max-active-levels-var ICV
is now per data environment rather than per device.
2020-11-18 Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* env.c (gomp_global_icv): Remove nest_var field. Add
max_active_levels_var field.
(gomp_max_active_levels_var): Remove.
(parse_boolean): Return true on success.
(handle_omp_display_env): Express OMP_NESTED in terms of
max_active_levels_var. Change format specifier for
max_active_levels_var.
(initialize_env): Set max_active_levels_var from
OMP_MAX_ACTIVE_LEVELS, OMP_NESTED, OMP_NUM_THREADS and
OMP_PROC_BIND.
* icv.c (omp_set_nested): Express in terms of
max_active_levels_var.
(omp_get_nested): Likewise.
(omp_set_max_active_levels): Use max_active_levels_var field instead
of gomp_max_active_levels_var.
(omp_get_max_active_levels): Likewise.
* libgomp.h (struct gomp_task_icv): Remove nest_var field. Add
max_active_levels_var field.
(gomp_supported_active_levels): Set to UCHAR_MAX.
(gomp_max_active_levels_var): Delete.
* libgomp.texi (omp_get_nested): Update documentation.
(omp_set_nested): Likewise.
(OMP_MAX_ACTIVE_LEVELS): Likewise.
(OMP_NESTED): Likewise.
(OMP_NUM_THREADS): Likewise.
(OMP_PROC_BIND): Likewise.
* parallel.c (gomp_resolve_num_threads): Replace reference
to nest_var with max_active_levels_var. Use max_active_levels_var
field instead of gomp_max_active_levels_var.
As typically configured, newlib's libc.a does not build 'posix' and,
hence, usleep is not available. Thus, use the same fallback as for nvptx.
libgomp/
* testsuite/libgomp.c/usleep.h (fallback_usleep): Renamed from
nvptx_usleep; use also for device={arch(gcn)}.
This patch adds support for custom allocators on private/firstprivate
clauses for task (and taskloop) constructs. Private didn't need anything
special, but firstprivate if it is passed by reference needs the GOMP_alloc
calls in the copyfn and GOMP_free in the task body.
2020-11-14 Jakub Jelinek <jakub@redhat.com>
* gimplify.c (gimplify_omp_for): Add OMP_CLAUSE_ALLOCATE_ALLOCATOR
decls as firstprivate on task clauses even when allocate clause
decl is not lastprivate.
* omp-low.c (install_var_field): Don't dereference omp_is_reference
types if mask is 33 rather than 1.
(scan_sharing_clauses): Populate allocate_map even for task
constructs. For now remove it back for variables mentioned in
reduction and in_reduction clauses on task/taskloop constructs
or on VLA task firstprivates. For firstprivate on task construct,
install the var field into field_map with by_ref and 33 instead
of false and 1 if mentioned in allocate clause.
(lower_private_allocate): Set TREE_THIS_NOTRAP on the created
MEM_REF.
(lower_rec_input_clauses): Handle allocate for task firstprivatized
non-VLA variables.
(create_task_copyfn): Likewise.
* testsuite/libgomp.c-c++-common/allocate-1.c (struct S): New type.
(foo): Add tests for non-VLA private and firstprivate clauses on
omp task.
(bar): Likewise. Remove taking of address from private/firstprivate
variables.
* testsuite/libgomp.c++/allocate-1.C (struct S): New type.
(foo): Add p, q, px and s arguments. Add tests for array reductions
and for non-VLA private and firstprivate clauses on omp task.
(bar): Removed.
(main): Adjust foo caller. Don't call bar.
Document status quo re PR94358 "[OMP] Privatize internal array variables
introduced by the Fortran FE".
libgomp/
PR fortran/94358
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: New.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
This adds allocate clause support for array section reductions.
Furthermore, it fixes one bug that would cause inscan reductions with
allocate to be rejected by C, and for now just ignores allocate for
inscan/task reductions, that will need slightly more work.
2020-11-13 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-low.c (scan_sharing_clauses): For now remove for reduction
clauses with inscan or task modifiers decl from allocate_map.
(lower_private_allocate): Handle TYPE_P (new_var).
(lower_rec_input_clauses): Handle allocate clause for C/C++ array
reductions.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Don't clear
OMP_CLAUSE_REDUCTION_INSCAN unless reduction_seen == -2.
libgomp/
* testsuite/libgomp.c-c++-common/allocate-1.c (foo): Add tests
for array reductions.
(main): Adjust foo callers.
For now, task/taskloop constructs aren't handled and C/C++ array reductions
and reductions with task or inscan modifiers need further work.
Instead of calling omp_alloc/omp_free (where the former doesn't have
alignment argument and omp_aligned_alloc is 5.1 only feature), this calls
GOMP_alloc/GOMP_free, so that the library can fail if it would fall back
into NULL (exception is zero length allocations).
2020-11-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* builtin-types.def (BT_FN_PTR_SIZE_SIZE_PTRMODE): New function type.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): Move earlier.
(BUILT_IN_GOMP_ALLOC, BUILT_IN_GOMP_FREE): New builtins.
* gimplify.c (gimplify_scan_omp_clauses): Force allocator into a
decl if it is not NULL, INTEGER_CST or decl.
(gimplify_adjust_omp_clauses): Clear GOVD_EXPLICIT on explicit clauses
which are being removed. Remove allocate clauses for variables not seen
if they are private, firstprivate or linear too. Call
omp_notice_variable on the allocator otherwise.
(gimplify_omp_for): Handle iterator vars mentioned in allocate clauses
similarly to non-is_gimple_reg iterators.
* omp-low.c (struct omp_context): Add allocate_map field.
(delete_omp_context): Delete it.
(scan_sharing_clauses): Fill it from allocate clauses. Remove it
if mentioned also in shared clause.
(lower_private_allocate): New function.
(lower_rec_input_clauses): Handle allocate clause for privatized
variables, except for task/taskloop, C/C++ array reductions for now
and task/inscan variables.
(lower_send_shared_vars): Don't consider variables in allocate_map
as shared.
* omp-expand.c (expand_omp_for_generic, expand_omp_for_static_nochunk,
expand_omp_for_static_chunk): Use expand_omp_build_assign instead of
gimple_build_assign + gsi_insert_after.
* builtins.c (builtin_fnspec): Handle BUILTIN_GOMP_ALLOC and
BUILTIN_GOMP_FREE.
* tree-ssa-ccp.c (evaluate_stmt): Handle BUILTIN_GOMP_ALLOC.
* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Handle
BUILTIN_GOMP_ALLOC.
(mark_all_reaching_defs_necessary_1): Handle BUILTIN_GOMP_ALLOC
and BUILTIN_GOMP_FREE.
(propagate_necessity): Likewise.
gcc/fortran/
* f95-lang.c (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
Define.
(gfc_init_builtin_functions): Add alloc_size and warn_unused_result
attributes to __builtin_GOMP_alloc.
* types.def (BT_PTRMODE): New primitive type.
(BT_FN_VOID_PTR_PTRMODE, BT_FN_PTR_SIZE_SIZE_PTRMODE): New function
types.
libgomp/
* libgomp.map (GOMP_alloc, GOMP_free): Export at GOMP_5.0.1.
* omp.h.in (omp_alloc): Add malloc and alloc_size attributes.
* libgomp_g.h (GOMP_alloc, GOMP_free): Declare.
* allocator.c (omp_aligned_alloc): New for now static function,
add alignment argument and handle it.
(omp_alloc): Reimplement using omp_aligned_alloc.
(GOMP_alloc, GOMP_free): New functions.
(omp_free): Add ialias.
* testsuite/libgomp.c-c++-common/allocate-1.c: New test.
* testsuite/libgomp.c++/allocate-1.C: New test.
This patch implements some parts of the target variable mapping changes
specified in OpenMP 5.0, including base-pointer attachment/detachment
behavior for array section list-items in map clauses, and ordering of
map clauses according to map kind.
2020-11-10 Chung-Lin Tang <cltang@codesourcery.com>
gcc/c-family/ChangeLog:
* c-common.h (c_omp_adjust_map_clauses): New declaration.
* c-omp.c (struct map_clause): Helper type for c_omp_adjust_map_clauses.
(c_omp_adjust_map_clauses): New function.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_target): Likewise.
* c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type.
(c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.
gcc/cp/ChangeLog:
* parser.c (cp_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_target): Likewise.
* semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
interaction between reference case and attach/detach.
(finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.
gcc/ChangeLog:
* gimplify.c (is_or_contains_p): New static helper function.
(omp_target_reorder_clauses): New function.
(gimplify_scan_omp_clauses): Add use of omp_target_reorder_clauses to
reorder clause list according to OpenMP 5.0 rules. Add handling of
GOMP_MAP_ATTACH_DETACH for OpenMP cases.
* omp-low.c (is_omp_target): New static helper function.
(scan_sharing_clauses): Add scan phase handling of GOMP_MAP_ATTACH/DETACH
for OpenMP cases.
(lower_omp_target): Add lowering handling of GOMP_MAP_ATTACH/DETACH for
OpenMP cases.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-2.c: Remove dg-error cases now valid.
* gfortran.dg/gomp/map-2.f90: Likewise.
* c-c++-common/gomp/map-5.c: New testcase.
libgomp/ChangeLog:
* libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag
usable.
* oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to
'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'.
(goacc_enter_datum): Likewise for call to gomp_map_vars_async.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_internal):
Change checks of GOMP_MAP_VARS_ENTER_DATA to use bit-and (&). Adjust use
of gomp_attach_pointer for OpenMP cases.
(gomp_exit_data): Add handling of GOMP_MAP_DETACH.
(GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH.
* testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase.
Avoid code duplication, and better test what we expect to happen.
libgomp/
PR target/85486
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Simplify and enhance.
* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
This changes makes 'dg-warning', 'dg-error', 'dg-bogus', 'dg-message' behave as
expected, and also enables use of relative line numbers as well as 'dg-line'.
libgomp/
PR testsuite/80219
PR testsuite/85303
* testsuite/lib/libgomp.exp (libgomp_init): Set
'gcc_warning_prefix', 'gcc_error_prefix'.
This marks all variants of declare variant also declare target if the base
functions are called directly in target regions or declare target functions.
2020-10-28 Jakub Jelinek <jakub@redhat.com>
gcc/
* omp-offload.c (omp_declare_target_tgt_fn_r): Handle direct calls to
declare variant base functions.
libgomp/
* testsuite/libgomp.c/target-42.c: New test.
With the patch I've posted today to fix up declare variant LTO handling,
Tobias reported the patch still doesn't work, and there are two
reasons for that.
One is that when the base function is marked implicitly as declare target,
we don't mark also implicitly the variants. I'll need to ask on omp-lang
about details for that, but generally the compiler should do it some way.
The other one is that the way base_delay is written, it will always
call the usleep function, which is undesirable for nvptx. While the
compiler will replace all direct calls to base_delay to nvptx_delay,
the base_delay definition which calls usleep stays.
2020-10-28 Jakub Jelinek <jakub@redhat.com>
Tom de Vries <tdevries@suse.de>
PR testsuite/81690
* testsuite/libgomp.c/usleep.h: New file.
* testsuite/libgomp.c/target-32.c: Include usleep.h.
(main): Use tgt_usleep instead of usleep.
* testsuite/libgomp.c/thread-limit-2.c: Include usleep.h.
(main): Use tgt_usleep instead of usleep.
> I've tried to add the saving/restoring next to ipa refs saving/restoring, as
> the declare variant alt stuff is kind of extension of those, unfortunately
> following doesn't compile, because I need to also write or read a tree there
> (ctx is a portion of DECL_ATTRIBUTES of the base function), but the ipa refs
> write/read back functions don't have arguments that can be used for that.
This patch adds the streaming out and in of those omp_declare_variant_alt
hash table on the side data for the declare_variant_alt cgraph_nodes and
treats for LTO purposes the declare_variant_alt nodes (which have no body)
as if they contained a body that calls all the possible variants.
After IPA all the calls to these magic declare_variant_alt calls are
replaced with call to one of the variant depending on which one has the
highest score in the context.
2020-10-28 Jakub Jelinek <jakub@redhat.com>
PR lto/96680
gcc/
* lto-streamer.h (omp_lto_output_declare_variant_alt,
omp_lto_input_declare_variant_alt): Declare variant.
* symtab.c (symtab_node::get_partitioning_class): Return
SYMBOL_DUPLICATE for declare_variant_alt nodes.
* passes.c (ipa_write_summaries): Add declare_variant_alt to
partition.
* lto-cgraph.c (output_refs): Call omp_lto_output_declare_variant_alt
on declare_variant_alt nodes.
(input_refs): Call omp_lto_input_declare_variant_alt on
declare_variant_alt nodes.
* lto-streamer-out.c (output_function): Don't call
collect_block_tree_leafs if DECL_INITIAL is error_mark_node.
(lto_output): Call output_function even for declare_variant_alt
nodes.
* omp-general.c (omp_lto_output_declare_variant_alt,
omp_lto_input_declare_variant_alt): New functions.
gcc/lto/
* lto-common.c (lto_fixup_prevailing_decls): Don't use
LTO_NO_PREVAIL on TREE_LIST's TREE_PURPOSE.
* lto-partition.c (lto_balanced_map): Treat declare_variant_alt
nodes like definitions.
libgomp/
* testsuite/libgomp.c/declare-variant-1.c: New test.
> Therefore, I think until omp_get_initial_device () value is changed, we
The following so far untested patch implements that change.
OpenMP 4.5 said for omp_get_initial_device:
The value of the device number is implementation defined. If it is between 0 and one less than
omp_get_num_devices() then it is valid for use with all device constructs and routines; if it is
outside that range, then it is only valid for use with the device memory routines and not in the
device clause.
and OpenMP 5.0 similarly, but OpenMP 5.1 says:
The value of the device number is the value returned by the omp_get_num_devices routine.
As the new value is compatible with what has been required earlier, I think
we can change it already now.
2020-10-22 Jakub Jelinek <jakub@redhat.com>
* icv.c (omp_get_initial_device): Remove including corresponding
ialias.
* icv-device.c (omp_get_initial_device): New function. Return
gomp_get_num_devices (). Add ialias.
* target.c (resolve_device): Don't fail with
OMP_TARGET_OFFLOAD=mandatory if device_id is equal to
gomp_get_num_devices ().
(omp_target_alloc, omp_target_free, omp_target_is_present,
omp_target_memcpy, omp_target_memcpy_rect, omp_target_associate_ptr,
omp_target_disassociate_ptr, omp_pause_resource): Use
gomp_get_num_devices () instead of GOMP_DEVICE_HOST_FALLBACK on the
first use in the functions, in uses dominated by the
gomp_get_num_devices call use num_devices_openmp instead.
* libgomp.texi (omp_get_initial_device): Document.
* config/gcn/icv-device.c (omp_get_initial_device): New function.
Add ialias.
* config/nvptx/icv-device.c (omp_get_initial_device): Likewise.
* testsuite/libgomp.c/target-40.c: New test.
> the patch also breaks bootstrap on both i386-pc-solaris2.11 and
> sparc-sun-solaris2.11:
>
> /vol/gcc/src/hg/master/local/libgomp/env.c: In function 'initialize_env':
> /vol/gcc/src/hg/master/local/libgomp/env.c:414:16: error: 'new_offload' may be used uninitialized in this function [-Werror=maybe-uninitialized]
> 414 | *offload = new_offload;
> | ~~~~~~~~~^~~~~~~~~~~~~
> /vol/gcc/src/hg/master/local/libgomp/env.c:384:30: note: 'new_offload' was declared here
> 384 | enum gomp_target_offload_t new_offload;
> | ^~~~~~~~~~~
I can't reproduce that, but I fail to see why we need two separate
variables, one with actual value and one tracking if the value is valid.
So, I'm going with:
2020-10-21 Jakub Jelinek <jakub@redhat.com>
* env.c (parse_target_offload): Change new_offload var type to int,
preinitialize to -1, remove found var and test new_offload != -1
instead of found.
> On 10/20/20 2:11 PM, Tobias Burnus wrote:
>
> > Unfortunately, the committed patch
> > (r11-4121-g1bfc07d150790fae93184a79a7cce897655cb37b)
> > causes build errors.
> >
> > The error seems to be provoked by function cloning – as the code
> > itself looks fine:
> > ...
> > struct gomp_device_descr *devices_s
> > = malloc (num_devices * sizeof (struct gomp_device_descr));
> > ...
> > for (i = 0; i < num_devices; i++)
> > if (!(devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
> > devices_s[num_devices_after_openmp++] = devices[i];
>
> gomp_target_init.part.0 ()
> {
> ...
> <bb 2>
> devices_s_1 = malloc (0);
> ...
> num_devices.16_67 = num_devices;
> ...
> if (num_devices.16_67 > 0)
> goto <bb 3>; [89.00%]
> else
> goto <bb 18>; [11.00%]
>
> Which seems to have an ordering problem.
This patch fixes the warning that breaks the bootstrap.
2020-10-20 Jakub Jelinek <jakub@redhat.com>
* target.c (gomp_target_init): Inside of the function, use automatic
variables corresponding to num_devices, num_devices_openmp and devices
global variables and update the globals only at the end of the
function.
This implements support for the OMP_TARGET_OFFLOAD environment variable
introduced in the OpenMP 5.0 standard, which controls how offloading
is handled. It may be set to MANDATORY (abort if offloading cannot be
performed), DISABLED (no offloading to devices) or DEFAULT (offload to
device if possible, fall back to host if not).
2020-10-20 Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* env.c (gomp_target_offload_var): New.
(parse_target_offload): New.
(handle_omp_display_env): Print value of OMP_TARGET_OFFLOAD.
(initialize_env): Parse OMP_TARGET_OFFLOAD.
* libgomp.h (gomp_target_offload_t): New.
(gomp_target_offload_var): New.
* libgomp.texi (OMP_TARGET_OFFLOAD): New section.
* target.c (resolve_device): Generate error if device not found and
offloading is mandatory.
(gomp_target_fallback): Generate error if offloading is mandatory.
(GOMP_target): Add argument in call to gomp_target_fallback.
(GOMP_target_ext): Likewise.
(gomp_target_data_fallback): Generate error if offloading is mandatory.
(GOMP_target_data): Add argument in call to gomp_target_data_fallback.
(GOMP_target_data_ext): Likewise.
(gomp_target_task_fn): Add argument in call to gomp_target_fallback.
(gomp_target_init): Return early if offloading is disabled.