Commit Graph

203 Commits

Author SHA1 Message Date
Tobias Burnus
33c4e46624 Add 'default' to -foffload=; document that flag [PR67300]
As -foffload={options,targets,targets=options} is very convoluted,
it has been split into -foffload=targets (supporting the old syntax
for backward compatibilty) and -foffload-options={options,target=options}.

Only the new syntax is documented.

Additionally, -foffload=default is supported, which can reset the
devices after -foffload=disable / -foffload=targets to the default,
if needed.

gcc/ChangeLog:

	PR other/67300
	* common.opt (-foffload=): Update description.
	(-foffload-options=): New.
	* doc/invoke.texi (C Language Options): Document
	-foffload and -foffload-options.
	* gcc.c (check_offload_target_name): New, split off from
	handle_foffload_option.
	(check_foffload_target_names): New.
	(handle_foffload_option): Handle -foffload=default.
	(driver_handle_option): Update for -foffload-options.
	* lto-opts.c (lto_write_options): Use -foffload-options
	instead of -foffload.
	* lto-wrapper.c (merge_and_complain, append_offload_options):
	Likewise.
	* opts.c (common_handle_option): Likewise.

libgomp/ChangeLog:

	PR other/67300
	* testsuite/libgomp.c-c++-common/reduction-16.c: Replace
	-foffload=nvptx-none= by -foffload-options=nvptx-none= to
	avoid disabling other offload targets.
	* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.c-c++-common/reduction-6.c: Likewise.
	* testsuite/libgomp.c/target-44.c: Likewise.
2021-06-29 16:00:04 +02:00
Thomas Schwinge
abf937ac00 'libgomp.c/target-44.c': Restrict '-latomic' to nvptx offloading compilation
Fix-up for recent commit f87990a2a8
"[openmp, simt] Disable SIMT for user-defined reduction"; see commit
d42088e453 "Avoid -latomic for amdgcn
offloading".

	libgomp/
	* testsuite/libgomp.c/target-44.c: Restrict '-latomic' to nvptx
	offloading compilation.
2021-05-18 12:57:35 +02:00
Martin Liska
810afb0b5f testsuite: prune new LTO warning
libgomp/ChangeLog:

	PR testsuite/100569
	* testsuite/libgomp.c/omp-nested-3.c: Prune new LTO warning.
	* testsuite/libgomp.c/pr46032-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c: Likewise.

gcc/testsuite/ChangeLog:

	PR testsuite/100569
	* gcc.dg/atomic/c11-atomic-exec-2.c: Prune new LTO warning.
	* gcc.dg/torture/pr94947-1.c: Likewise.
2021-05-13 09:24:23 +02:00
Jakub Jelinek
98acbb3111 openmp: Fix up taskloop reduction ICE if taskloop has no iterations [PR100471]
When a taskloop doesn't have any iterations, GOMP_taskloop* takes an early
return, doesn't create any tasks and more importantly, doesn't create
a taskgroup and doesn't register task reductions.  But, the code emitted
in the callers assumes task reductions have been registered and performs
the reduction handling and task reduction unregistration.  The pointer
to the task reduction private variables is reused, on input it is the alignment
and only on output it is the pointer, so in the case taskloop with no iterations
the caller attempts to dereference the alignment value as if it was a pointer
and crashes.  We could in the early returns register the task reductions
only to have them looped over and unregistered in the caller, but I think
it is better to tell the caller there is nothing to task reduce and bypass
all that.

2021-05-11  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/100471
	* omp-low.c (lower_omp_task_reductions): For OMP_TASKLOOP, if data
	is 0, bypass the reduction loop including
	GOMP_taskgroup_reduction_unregister call.

	* taskloop.c (GOMP_taskloop): If GOMP_TASK_FLAG_REDUCTION and not
	GOMP_TASK_FLAG_NOGROUP, when doing early return clear the task
	reduction pointer.
	* testsuite/libgomp.c/task-reduction-4.c: New test.
2021-05-11 09:07:47 +02:00
Tom de Vries
f87990a2a8 [openmp, simt] Disable SIMT for user-defined reduction
The test-case included in this patch contains this target region:
...
  for (int i0 = 0 ; i0 < N0 ; i0++ )
    counter_N0.i += 1;
...

When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32.  The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.

This is caused by the implementation of SIMT being incomplete.  It handles
regular reductions, but appearantly not user-defined reductions.

For now, handle this by disabling SIMT in this case, specifically by setting
sctx->max_vf to 1.

Tested libgomp on x86_64-linux with nvptx accelerator.

gcc/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined
	reduction.

libgomp/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* testsuite/libgomp.c/target-44.c: New test.
2021-05-03 23:13:59 +02:00
Tom de Vries
fc14ff6111 [omp, simt] Handle alternative IV
Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
  int s = 1;
  #pragma omp target simd
  for (int i = N - 1; i > -1; i -= s)
    a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.

After omp-expand, this looks like:
...
  <bb 5> :
  simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
  .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
  D.3193 = -s;
  s.9 = s;
  D.3204 = .GOMP_SIMT_LANE ();
  D.3205 = -s.9;
  D.3206 = (int) D.3204;
  D.3207 = D.3205 * D.3206;
  i = D.3207 + 31;
  D.3209 = 0;
  D.3210 = -s.9;
  D.3211 = D.3210 - i;
  D.3210 = -s.9;
  D.3212 = D.3211 / D.3210;
  D.3213 = (unsigned int) D.3212;
  D.3213 = i >= 0 ? D.3213 : 0;

  <bb 19> :
  if (D.3209 < D.3213)
    goto <bb 6>; [87.50%]
  else
    goto <bb 7>; [12.50%]

  <bb 6> :
  a[i] = 1;
  D.3215 = -s.9;
  D.3219 = .GOMP_SIMT_VF ();
  D.3216 = (int) D.3219;
  D.3220 = D.3215 * D.3216;
  i = D.3220 + i;
  D.3209 = D.3209 + 1;
  goto <bb 19>; [100.00%]
...

On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.

So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
  to true
- bb6 is once more executed, which should not happen because all the elements
  that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
  ...
  libgomp: cuCtxSynchronize error: an illegal memory access was encountered
  ...
  and with CANARY_SIZE unmodified, we run into:
  ...
  Expected 0, got 1 at base[-961]
  Aborted (core dumped)
  ...

The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
  IV (D.3209 in our example) is generated in expand_omp_simd, and the
  loop condition is tested in terms of the alternative IV rather than
  the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
  modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is modified
  by the SIMT code, but afterwards there are uses of both step and
  fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
  which should use step instead.

Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.

Build on x86_64-linux with nvptx accelerator, tested libgomp.

This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.

gcc/ChangeLog:

2020-10-02  Tom de Vries  <tdevries@suse.de>

	* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
	fd->loop.step by either step or orig_step.

libgomp/ChangeLog:

2020-10-02  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/pr81778.c: New test.
2021-04-29 14:37:32 +02:00
Tom de Vries
4d7c874e2c [omp, simt] Fix expand_GOMP_SIMT_*
When running the test-case included in this patch using an
nvptx accelerator, it fails in execution.

The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
during pass_jump as "trivially dead insns".

This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
...
  class expand_operand ops[3];
  create_output_operand (&ops[0], target, mode);
  ...
  expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
...
which doesn't guarantee that target is assigned to by the expanded insn.

F.i., if target is:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...
then after expand_insn, we have:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...

See commit 3af3bec2e4 "internal-fn: Avoid dropping the lhs of some
calls [PR94941]" for a similar problem.

Fix this in the same way, by adding:
...
  if (!rtx_equal_p (target, ops[0].value))
    emit_move_insn (target, ops[0].value);
...
where applicable in the expand_GOMP_SIMT_* functions.

Tested libgomp on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2021-04-28  Tom de Vries  <tdevries@suse.de>

	PR target/100232
	* internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
	(expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
	(expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
	(expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.
2021-04-29 09:55:15 +02:00
Tobias Burnus
95dfc3ac7b libgomp/testsuite: Fix checks for dg-excess-errors
For the tests modified below, the effective target line has to be effective
when compiling for an offload target, except that variable-not-offloaded.c
would compile with unified-share memory and pr86416-*.c if long double/float128
is supported.
The previous check used a run-time device ability check. This new variant
now enables those dg- lines when _compiling_ for nvptx or gcn.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
	New, based on check_effective_target_offload_target_nvptx.
	(check_effective_target_offload_target_nvptx): Call it.
	(check_effective_target_offload_target_amdgcn): New.
	* testsuite/libgomp.c-c++-common/function-not-offloaded.c:
	Require target offload_target_nvptx || offload_target_amdgcn.
	* testsuite/libgomp.c-c++-common/variable-not-offloaded.c: Likewise.
	* testsuite/libgomp.c/pr86416-1.c: Likewise.
	* testsuite/libgomp.c/pr86416-2.c: Likewise.
2021-04-21 20:07:19 +02:00
Thomas Schwinge
4dd9e1c541 XFAIL OpenMP/nvptx execution-time hangs for simple nested OpenMP 'target'/'parallel'/'task' constructs [PR99555]
... still awaiting proper resolution, of course.

	libgomp/
	PR target/99555
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_nvptx): New.
	* testsuite/libgomp.c/pr99555-1.c <nvptx offload device>: Until
	resolved, make sure that we exit quickly, with error status,
	XFAILed.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise.
	* testsuite/libgomp.fortran/task-detach-6.f90: Likewise.
2021-04-15 11:13:27 +02:00
Tobias Burnus
d579e2e76f libgomp: Fix on_device_arch.c aux-file handling [PR99555]
libgomp/ChangeLog:

	PR target/99555
	* testsuite/lib/on_device_arch.c: Move to ...
	* testsuite/libgomp.c-c++-common/on_device_arch.h: ... here.
	* testsuite/libgomp.fortran/on_device_arch.c: New file;
	#include on_device_arch.h.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: #include
	on_device_arch.h instead of using dg-additional-source.
	* testsuite/libgomp.c/pr99555-1.c: Likewise.
	* testsuite/libgomp.fortran/task-detach-6.f90: Update to use
	on_device_arch.c without relative paths.
2021-03-29 10:40:38 +02:00
Thomas Schwinge
d99111fd8e Avoid OpenMP/nvptx execution-time hangs for simple nested OpenMP 'target'/'parallel'/'task' constructs [PR99555]
... awaiting proper resolution, of course.

	libgomp/
	PR target/99555
	* testsuite/lib/on_device_arch.c: New file.
	* testsuite/libgomp.c/pr99555-1.c: Likewise.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Until resolved,
	skip for nvptx offloading, with error status.
	* testsuite/libgomp.fortran/task-detach-6.f90: Likewise.
2021-03-25 13:00:11 +01:00
Jakub Jelinek
99dee82307 Update copyright years. 2021-01-04 10:26:59 +01:00
Jakub Jelinek
8b60459465 openmp: Don't optimize shared to firstprivate on task with depend clause
The attached testcase is miscompiled, because we optimize shared clauses
to firstprivate when task body can't modify the variable even when the
task has depend clause.  That is wrong, because firstprivate means the
variable will be copied immediately when the task is created, while with
depend clause some other task might change it later before the dependencies
are satisfied and the task should observe the value only after the change.

2020-12-18  Jakub Jelinek  <jakub@redhat.com>

	* gimplify.c (struct gimplify_omp_ctx): Add has_depend member.
	(gimplify_scan_omp_clauses): Set it to true if OMP_CLAUSE_DEPEND
	appears on OMP_TASK.
	(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Force
	GOVD_WRITTEN on shared variables if task construct has depend clause.

	* testsuite/libgomp.c/task-6.c: New test.
2020-12-18 21:43:20 +01:00
Tobias Burnus
cb1a4876a0 testsuite/libgomp.c/usleep.h: Use sleep-loop also for GCN
As typically configured, newlib's libc.a does not build 'posix' and,
hence, usleep is not available. Thus, use the same fallback as for nvptx.

libgomp/
	* testsuite/libgomp.c/usleep.h (fallback_usleep): Renamed from
	nvptx_usleep; use also for device={arch(gcn)}.
2020-11-18 14:11:27 +01:00
Kwok Cheung Yeung
10508db867 openmp: Mark deprecated symbols in OpenMP 5.0
2020-11-05  Ulrich Drepper  <drepper@redhat.com>
	    Kwok Cheung Yeung  <kcy@codesourcery.com>

	libgomp/
	* Makefile.am (%.mod): Add -cpp and -fopenmp to compile flags.
	* Makefile.in: Regenerate.
	* fortran.c: Wrap uses of omp_set_nested and omp_get_nested with
	pragmas to ignore -Wdeprecated-declarations warnings.
	* icv.c: Likewise.
	* omp.h.in (__GOMP_DEPRECATED_5_0): Define.
	Mark omp_lock_hint_* enum values, omp_lock_hint_t, omp_set_nested,
	and omp_get_nested with __GOMP_DEPRECATED_5_0.
	* omp_lib.f90.in: Mark omp_get_nested and omp_set_nested as
	deprecated.
	* testsuite/libgomp.c++/affinity-1.C: Add -Wno-deprecated-declarations
	to test options.
	* testsuite/libgomp.c/affinity-1.c: Likewise.
	* testsuite/libgomp.c/affinity-2.c: Likewise.
	* testsuite/libgomp.c/appendix-a/a.15.1.c: Likewise.
	* testsuite/libgomp.c/lib-1.c: Likewise.
	* testsuite/libgomp.c/nested-1.c: Likewise.
	* testsuite/libgomp.c/nested-2.c: Likewise.
	* testsuite/libgomp.c/nested-3.c: Likewise.
	* testsuite/libgomp.c/pr32362-1.c: Likewise.
	* testsuite/libgomp.c/pr32362-2.c: Likewise.
	* testsuite/libgomp.c/pr32362-3.c: Likewise.
	* testsuite/libgomp.c/pr35549.c: Likewise.
	* testsuite/libgomp.c/pr42942.c: Likewise.
	* testsuite/libgomp.c/pr61200.c: Likewise.
	* testsuite/libgomp.c/sort-1.c: Likewise.
	* testsuite/libgomp.c/target-5.c: Likewise.
	* testsuite/libgomp.c/target-6.c: Likewise.
	* testsuite/libgomp.c/teams-1.c: Likewise.
	* testsuite/libgomp.c/thread-limit-1.c: Likewise.
	* testsuite/libgomp.c/thread-limit-2.c: Likewise.
	* testsuite/libgomp.c/thread-limit-4.c: Likewise.
	* testsuite/libgomp.fortran/affinity1.f90: Likewise.
	* testsuite/libgomp.fortran/lib1.f90: Likewise.
	* testsuite/libgomp.fortran/lib2.f: Likewise.
	* testsuite/libgomp.fortran/nested1.f90: Likewise.
	* testsuite/libgomp.fortran/teams1.f90: Likewise.
2020-11-05 10:32:56 -08:00
Jakub Jelinek
2298ca2d3e openmp: Implicitly discover declare target for variants of declare variant calls
This marks all variants of declare variant also declare target if the base
functions are called directly in target regions or declare target functions.

2020-10-28  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-offload.c (omp_declare_target_tgt_fn_r): Handle direct calls to
	declare variant base functions.
libgomp/
	* testsuite/libgomp.c/target-42.c: New test.
2020-10-28 10:36:31 +01:00
Jakub Jelinek
3f39b64e57 xfail and improve some failing libgomp tests [PR81690]
With the patch I've posted today to fix up declare variant LTO handling,
Tobias reported the patch still doesn't work, and there are two
reasons for that.
One is that when the base function is marked implicitly as declare target,
we don't mark also implicitly the variants.  I'll need to ask on omp-lang
about details for that, but generally the compiler should do it some way.
The other one is that the way base_delay is written, it will always
call the usleep function, which is undesirable for nvptx.  While the
compiler will replace all direct calls to base_delay to nvptx_delay,
the base_delay definition which calls usleep stays.

2020-10-28  Jakub Jelinek  <jakub@redhat.com>
	    Tom de Vries  <tdevries@suse.de>

	PR testsuite/81690
	* testsuite/libgomp.c/usleep.h: New file.
	* testsuite/libgomp.c/target-32.c: Include usleep.h.
	(main): Use tgt_usleep instead of usleep.
	* testsuite/libgomp.c/thread-limit-2.c: Include usleep.h.
	(main): Use tgt_usleep instead of usleep.
2020-10-28 10:30:41 +01:00
Jakub Jelinek
f165ef89c0 lto: LTO cgraph support for late declare variant resolution [PR96680]
> I've tried to add the saving/restoring next to ipa refs saving/restoring, as
> the declare variant alt stuff is kind of extension of those, unfortunately
> following doesn't compile, because I need to also write or read a tree there
> (ctx is a portion of DECL_ATTRIBUTES of the base function), but the ipa refs
> write/read back functions don't have arguments that can be used for that.

This patch adds the streaming out and in of those omp_declare_variant_alt
hash table on the side data for the declare_variant_alt cgraph_nodes and
treats for LTO purposes the declare_variant_alt nodes (which have no body)
as if they contained a body that calls all the possible variants.
After IPA all the calls to these magic declare_variant_alt calls are
replaced with call to one of the variant depending on which one has the
highest score in the context.

2020-10-28  Jakub Jelinek  <jakub@redhat.com>

	PR lto/96680
gcc/
	* lto-streamer.h (omp_lto_output_declare_variant_alt,
	omp_lto_input_declare_variant_alt): Declare variant.
	* symtab.c (symtab_node::get_partitioning_class): Return
	SYMBOL_DUPLICATE for declare_variant_alt nodes.
	* passes.c (ipa_write_summaries): Add declare_variant_alt to
	partition.
	* lto-cgraph.c (output_refs): Call omp_lto_output_declare_variant_alt
	on declare_variant_alt nodes.
	(input_refs): Call omp_lto_input_declare_variant_alt on
	declare_variant_alt nodes.
	* lto-streamer-out.c (output_function): Don't call
	collect_block_tree_leafs if DECL_INITIAL is error_mark_node.
	(lto_output): Call output_function even for declare_variant_alt
	nodes.
	* omp-general.c (omp_lto_output_declare_variant_alt,
	omp_lto_input_declare_variant_alt): New functions.
gcc/lto/
	* lto-common.c (lto_fixup_prevailing_decls): Don't use
	LTO_NO_PREVAIL on TREE_LIST's TREE_PURPOSE.
	* lto-partition.c (lto_balanced_map): Treat declare_variant_alt
	nodes like definitions.
libgomp/
	* testsuite/libgomp.c/declare-variant-1.c: New test.
2020-10-28 10:29:09 +01:00
Jakub Jelinek
17c5b7e1dc openmp: Add test for OMP_TARGET_OFFLOAD=mandatory for cases where it must not fail
2020-10-22  Jakub Jelinek  <jakub@redhat.com>

	* testsuite/libgomp.c/target-41.c: New test.
2020-10-22 09:36:18 +02:00
Jakub Jelinek
74c9882b80 openmp: Change omp_get_initial_device () to match OpenMP 5.1 requirements
> Therefore, I think until omp_get_initial_device () value is changed, we

The following so far untested patch implements that change.

OpenMP 4.5 said for omp_get_initial_device:
The value of the device number is implementation defined. If it is between 0 and one less than
omp_get_num_devices() then it is valid for use with all device constructs and routines; if it is
outside that range, then it is only valid for use with the device memory routines and not in the
device clause.
and OpenMP 5.0 similarly, but OpenMP 5.1 says:
The value of the device number is the value returned by the omp_get_num_devices routine.

As the new value is compatible with what has been required earlier, I think
we can change it already now.

2020-10-22  Jakub Jelinek  <jakub@redhat.com>

	* icv.c (omp_get_initial_device): Remove including corresponding
	ialias.
	* icv-device.c (omp_get_initial_device): New function.  Return
	gomp_get_num_devices ().  Add ialias.
	* target.c (resolve_device): Don't fail with
	OMP_TARGET_OFFLOAD=mandatory if device_id is equal to
	gomp_get_num_devices ().
	(omp_target_alloc, omp_target_free, omp_target_is_present,
	omp_target_memcpy, omp_target_memcpy_rect, omp_target_associate_ptr,
	omp_target_disassociate_ptr, omp_pause_resource): Use
	gomp_get_num_devices () instead of GOMP_DEVICE_HOST_FALLBACK on the
	first use in the functions, in uses dominated by the
	gomp_get_num_devices call use num_devices_openmp instead.
	* libgomp.texi (omp_get_initial_device): Document.
	* config/gcn/icv-device.c (omp_get_initial_device): New function.
	Add ialias.
	* config/nvptx/icv-device.c (omp_get_initial_device): Likewise.
	* testsuite/libgomp.c/target-40.c: New test.
2020-10-22 09:31:01 +02:00
Kwok Cheung Yeung
8949b985db openmp: Add support for the omp_get_supported_active_levels runtime library routine
This patch implements the omp_get_supported_active_levels runtime routine
from the OpenMP 5.0 specification, which returns the maximum number of
active nested parallel regions supported by this implementation.  The
current maximum (set using the omp_set_max_active_levels routine or the
OMP_MAX_ACTIVE_LEVELS environment variable) cannot exceed this number.

2020-10-13  Kwok Cheung Yeung  <kcy@codesourcery.com>

	libgomp/
	* env.c (gomp_max_active_levels_var): Initialize to
	gomp_supported_active_levels.
	(initialize_env): Limit gomp_max_active_levels_var to be at most
	equal to gomp_supported_active_levels.
	* fortran.c (omp_get_supported_active_levels): Add ialias_redirect.
	(omp_get_supported_active_levels_): New.
	* icv.c (omp_set_max_active_levels): Limit gomp_max_active_levels_var
	to at most equal to gomp_supported_active_levels.
	(omp_get_supported_active_levels): New.
	* libgomp.h (gomp_supported_active_levels): New.
	* libgomp.map (OMP_5.0.1): Add omp_get_supported_active_levels and
	omp_get_supported_active_levels_.
	* libgomp.texi (omp_get_supported_active_levels): New.
	(omp_set_max_active_levels): Update.  Add reference to
	omp_get_supported_active_levels.
	* omp.h.in (omp_get_supported_active_levels): New.
	* omp_lib.f90.in (omp_get_supported_active_levels): New.
	* omp_lib.h.in (omp_get_supported_active_levels): New.
	* testsuite/libgomp.c/lib-2.c (main): Check omp_get_max_active_levels
	against omp_get_supported_active_levels.
	* testsuite/libgomp.fortran/lib4.f90 (lib4): Likewise.
2020-10-13 13:21:02 -07:00
Jakub Jelinek
c2ebf4f10d openmp: Add support for non-rect simd and improve collapsed simd support
The following change adds support for non-rectangular simd loops.
While working on that, I've noticed we actually don't vectorize collapsed
simd loops at all, because the code that I thought would be vectorizable
actually is not vectorized.  While in theory for the constant lower/upper
bounds and constant step of all but the outermost loop we could in theory
vectorize by computing the seprate iterators using vectorized division
and modulo for each of them from the single iterator that increments
by 1 from 0 to total iteration count in the loop nest, I think that would
be fairly expensive and the chances of the loop body being vectorizable
would be low e.g. because of array indices unlikely to be linear and would
need scatters/gathers.
This patch changes the generated code to vectorize only the innermost
loop which has higher chance of being vectorized.  Below is the list of
tests and function names in which the patch resulted in vectorizing something
that hasn't been vectorized before (ok, the first line is a new test).
I've also found that the vectorizer will not vectorize loops with non-constant
steps, I plan to do something about those incrementally on the omp-expand.c
side (basically, compute number of iterations before the loop and use a 0 to
number_of_iterations step 1 IV as the main one).

I have problem with the composite simd vectorization though.
The point is that each thread (or task etc.) is given only a range of
consecutive iterations, so somewhere earlier it computes total number of iterations
and splits the work between the workers and then the intent is to try to vectorize it.
So, each thread is then given a begin ... end-1 range that it would handle.
This means that from the single begin value I need to compute the individual iteration
vars I should start at and then goto into the loop nest to begin iterating there
(and actually compute how many iterations the innermost loop should do each time
so that it stops before end).
Very roughly the IL I emit is something like:
int t[100][100][100];

void
foo (int a, int b, int c, int d, int e, int f, int g, int h, int u, int v, int w, int x)
{
  int i, j, k;
  int cnt;
  if (x)
    {
      i = u; j = v; k = w; goto doit;
    }
  for (i = a; i < b; i += c)
    for (j = d; j < e; j += f)
      {
        k = g;
        doit:
        for (; k < h; k++)
          t[i][j][k] += i + j + k;
      }
}
Unfortunately, some pass then turns the innermost loop to have more than 2 basic blocks
and it isn't vectorized because of that.

Also, I have disabled (for now) SIMTization of collapsed simd loops, because for SIMT
it would be using a single thread anyway and I didn't want to bother with checking
SIMT on all places I've been changing.  If SIMT support is added for some or all
collapsed loops, that omp-low.c change needs to be reverted.

Here is that list of what hasn't been vectorized before and is now:

gcc/testsuite/gcc.dg/vect/vect-simd-17.c doit
gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 bar
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-10.c f28_taskloop_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-10.c _Z24f28_taskloop_simd_normalv._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c f25_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c f26_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c f27_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c f28_tpf_simd_guided32._omp_fn.1
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c f28_tpf_simd_runtime._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c _Z17f25_t_simd_normaliiiiiii._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c _Z17f26_t_simd_normaliiiixxi._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c _Z17f27_t_simd_normalv._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c _Z20f28_tpf_simd_runtimev._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c _Z21f28_tpf_simd_guided32v._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c f7_simd_normal
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c f7_simd_normal
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c f8_f_simd_guided32
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c f8_f_simd_guided32
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c f8_f_simd_runtime
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c f8_f_simd_runtime
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c f8_pf_simd_guided32._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c f8_pf_simd_runtime._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c _Z18f8_pf_simd_runtimev._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c _Z19f8_pf_simd_guided32v._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-4.c f8_taskloop_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-4.c _Z23f8_taskloop_simd_normalv._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-5.c f7_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-5.c f8_tpf_simd_guided32._omp_fn.1
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-5.c f8_tpf_simd_runtime._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-5.c _Z16f7_t_simd_normalv._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-5.c _Z19f8_tpf_simd_runtimev._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-5.c _Z20f8_tpf_simd_guided32v._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-8.c f25_simd_normal
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-8.c f25_simd_normal
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-8.c f26_simd_normal
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-8.c f26_simd_normal
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-8.c f27_simd_normal
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-8.c f27_simd_normal
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-8.c f28_f_simd_guided32
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-8.c f28_f_simd_guided32
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-8.c f28_f_simd_runtime
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-8.c f28_f_simd_runtime
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-8.c f28_pf_simd_guided32._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-8.c f28_pf_simd_runtime._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-8.c _Z19f28_pf_simd_runtimev._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-8.c _Z20f28_pf_simd_guided32v._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/master-combined-1.c main._omp_fn.9
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/master-combined-1.c main._omp_fn.9
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/simd-1.c f2
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/simd-1.c f2
libgomp/testsuite/libgomp.c/pr70680-2.c f1._omp_fn.0
libgomp/testsuite/libgomp.c/pr70680-2.c f2._omp_fn.0
libgomp/testsuite/libgomp.c/pr70680-2.c f3._omp_fn.0
libgomp/testsuite/libgomp.c/pr70680-2.c f4._omp_fn.0
libgomp/testsuite/libgomp.c/simd-8.c foo
libgomp/testsuite/libgomp.c/simd-9.c bar
libgomp/testsuite/libgomp.c/simd-9.c foo

2020-09-25  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* omp-low.c (scan_omp_1_stmt): Don't call scan_omp_simd for
	collapse > 1 loops as simt doesn't support collapsed loops yet.
	* omp-expand.c (expand_omp_for_init_counts, expand_omp_for_init_vars):
	Small tweaks to function comment.
	(expand_omp_simd): Rewritten collapse > 1 support to only attempt
	to vectorize the innermost loop and emit set of outer loops around it.
	For non-composite simd with collapse > 1 without broken loop don't
	even try to compute number of iterations first.  Add support for
	non-rectangular simd loops.
	(expand_omp_for): Don't sorry_at on non-rectangular simd loops.
gcc/testsuite/
	* gcc.dg/vect/vect-simd-17.c: New test.
libgomp/
	* testsuite/libgomp.c/loop-25.c: New test.
2020-09-25 10:43:37 +02:00
Jakub Jelinek
2e47c8c6ea openmp: Add support for non-rectangular loops in taskloop construct
2020-08-13  Jakub Jelinek  <jakub@redhat.com>

	* gimplify.c (gimplify_omp_taskloop_expr): New function.
	(gimplify_omp_for): Use it.  For OMP_FOR_NON_RECTANGULAR
	loops adjust in outer taskloop the var-outer decls.
	* omp-expand.c (expand_omp_taskloop_for_inner): Handle non-rectangular
	loops.
	(expand_omp_for): Don't reject non-rectangular taskloop.
	* omp-general.c (omp_extract_for_data): Don't assert that
	non-rectangular loops have static schedule, instead treat loop->m1
	or loop->m2 as if loop->n1 or loop->n2 is non-constant.

	* testsuite/libgomp.c/loop-22.c (main): Add some further tests.
	* testsuite/libgomp.c/loop-23.c (main): Likewise.
	* testsuite/libgomp.c/loop-24.c: New test.
2020-08-13 09:06:05 +02:00
Jakub Jelinek
9f3abfb84e openmp: Handle even some combined non-rectangular loops
The number of loops computation and logical iteration -> actual iterator values
computations can now be done separately even on composite constructs (though
for triangular loops it would still be more efficient to propagate a few values
through, will handle that incrementally).
simd and taskloop are still unhandled.

2020-08-05  Jakub Jelinek  <jakub@redhat.com>

	* omp-expand.c (expand_omp_for): Don't disallow combined non-rectangular
	loops.

	* testsuite/libgomp.c/loop-22.c: New test.
	* testsuite/libgomp.c/loop-23.c: New test.
2020-08-05 10:45:16 +02:00
Jakub Jelinek
916c7a201a openmp: Handle reduction clauses on host teams construct [PR96459]
As the new testcase shows, we weren't actually performing reductions on
host teams construct.  And fixing that revealed a flaw in the for-14.c testcase.
The problem is that the tests perform also initialization and checking around the
calls to the functions with the OpenMP constructs.  In that testcase, all the
tests have been spawned from a teams construct but only the tested loops were
distribute, which means the initialization and checking has been performed
redundantly and racily in each team.  Fixed by performing the initialization
and checking outside of host teams and only do the calls to functions with
the tested constructs inside of host teams.

2020-08-05  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/96459
	* omp-low.c (lower_omp_taskreg): Call lower_reduction_clauses even in
	for host teams.

	* testsuite/libgomp.c/teams-3.c: New test.
	* testsuite/libgomp.c-c++-common/for-2.h (OMPTEAMS): Define to nothing
	if not defined yet.
	(N(test)): Use it before all N(f*) calls.
	* testsuite/libgomp.c-c++-common/for-14.c (DO_PRAGMA, OMPTEAMS): Define.
	(main): Don't call all test_* functions from within
	#pragma omp teams reduction(|:err), call them directly.
2020-08-05 10:40:10 +02:00
H.J. Lu
7aa22a8f1a x86-64: Define ASM_OUTPUT_ALIGNED_DECL_LOCAL
Define ASM_OUTPUT_ALIGNED_DECL_LOCAL for large local common symbol.

gcc/

	PR target/95620
	* config/i386/x86-64.h (ASM_OUTPUT_ALIGNED_DECL_LOCAL): New.

libgomp/

	PR target/95620
	* testsuite/libgomp.c/pr95620.c: New test.
2020-07-18 08:51:54 -07:00
Jakub Jelinek
f418bd4b92 openmp: Adjust outer bounds of non-rect loops
In loops like:
  #pragma omp parallel for collapse(2)
  for (i = -4; i < 8; i++)
    for (j = 3 * i; j > 2 * i; j--)
for some outer loop iterations there are no inner loop iterations at all,
the condition is false.  In order to use Summæ Potestate to count number
of iterations or to transform the logical iteration number to actual
iterator values using quadratic non-equation root discovery the outer
iterator range needs to be adjusted, such that the inner loop has at least
one iteration for each of the outer loop iterator value in the reduced
range.  Sometimes this adjustment is done at the start of the range,
at other times at the end.

This patch implements it during the compile time number of loop computation
(if all expressions are compile time constants).

2020-07-14  Jakub Jelinek  <jakub@redhat.com>

	* omp-general.h (struct omp_for_data): Add adjn1 member.
	* omp-general.c (omp_extract_for_data): For non-rect loop, punt on
	count computing if n1, n2 or step are not INTEGER_CST earlier.
	Narrow the outer iterator range if needed so that non-rect loop
	has at least one iteration for each outer range iteration.  Compute
	adjn1.
	* omp-expand.c (expand_omp_for_init_vars): Use adjn1 if non-NULL
	instead of the outer loop's n1.

	* testsuite/libgomp.c/loop-21.c: New test.
2020-07-14 10:31:59 +02:00
Jakub Jelinek
5acef69f9d openmp: Optimize triangular loop logical iterator to actual iterators computation using search for quadratic equation root(s)
This patch implements the optimized logical to actual iterators
computation for triangular loops.

I have a rough implementation using integers, but this one uses floating
point.  There is a small problem that -fopenmp programs aren't linked with
-lm, so it does it only if the hw has sqrt optab (and uses ifn rather than
__builtin_sqrt because it obviously doesn't need errno handling etc.).

Do you think it is ok this way, or should I use the integral computation
using inlined isqrt (we have inequation of the form
start >= x * t10 + t11 * (((x - 1) * x) / 2)
where t10 and t11 are signed long long values and start unsigned long long,
and the division by 2 actually is a problem for accuracy in some cases, so
if we do it in integral, we need to do actually
      long long t12 = 2 * t10 - t11;
      unsigned long long t13 = t12 * t12 + start * 8 * t11;
      unsigned long long isqrt_ = isqrtull (t13);
      long long x = (((long long) isqrt_ - t12) / t11) >> 1;
with careful overflow checking on all the computations before isqrtull
(and on overflows use the fallback implementation).

2020-07-09  Jakub Jelinek  <jakub@redhat.com>

	* omp-general.h (struct omp_for_data): Add min_inner_iterations
	and factor members.
	* omp-general.c (omp_extract_for_data): Initialize them and remember
	them in OMP_CLAUSE_COLLAPSE_COUNT if needed and restore from there.
	* omp-expand.c (expand_omp_for_init_counts): Fix up computation of
	counts[fd->last_nonrect] if fd->loop.n2 is INTEGER_CST.
	(expand_omp_for_init_vars): For
	fd->first_nonrect + 1 == fd->last_nonrect loops with for now
	INTEGER_CST fd->loop.n2 find quadratic equation roots instead of
	using fallback method when possible.

	* testsuite/libgomp.c/loop-19.c: New test.
	* testsuite/libgomp.c/loop-20.c: New test.
2020-07-09 12:07:17 +02:00
Jakub Jelinek
aed3ab253d openmp: Non-rectangular loop support for non-composite worksharing loops and distribute
This implements the fallback mentioned in
https://gcc.gnu.org/pipermail/gcc/2020-June/232874.html
Special cases for triangular loops etc. to follow later, also composite
constructs not supported yet (need to check the passing of temporaries around)
and lastprivate might not give the same answers as serial loop if the last
innermost body iteration isn't the last one for some of the outer loops
(that will need to be solved separately together with rectangular loops that have no
innermost body iterations, but some of the outer loops actually iterate).
Also, simd needs work.

2020-06-27  Jakub Jelinek  <jakub@redhat.com>

	* omp-general.h (struct omp_for_data_loop): Add non_rect_referenced
	member, move outer member.
	(struct omp_for_data): Add first_nonrect and last_nonrect members.
	* omp-general.c (omp_extract_for_data): Initialize first_nonrect,
	last_nonrect and non_rect_referenced members.
	* omp-expand.c (expand_omp_for_init_counts): Handle non-rectangular
	loops.
	(expand_omp_for_init_vars): Add nonrect_bounds parameter.  Handle
	non-rectangular loops.
	(extract_omp_for_update_vars): Likewise.
	(expand_omp_for_generic, expand_omp_for_static_nochunk,
	expand_omp_for_static_chunk, expand_omp_simd,
	expand_omp_taskloop_for_outer, expand_omp_taskloop_for_inner): Adjust
	expand_omp_for_init_vars and extract_omp_for_update_vars callers.
	(expand_omp_for): Don't sorry on non-composite worksharing-loop or
	distribute.

	* testsuite/libgomp.c/loop-17.c: New test.
	* testsuite/libgomp.c/loop-18.c: New test.
2020-06-27 12:43:36 +02:00
Jakub Jelinek
dc703151d4 openmp: Implement discovery of implicit declare target to clauses
This attempts to implement what the OpenMP 5.0 spec in declare target section
says as ammended by the 5.1 changes so far (related to device_type(host)), except
that it doesn't have the device(ancestor: ...) handling yet because we do not
support it yet, and I've left so far out the except lambda note, because I need
that clarified.

2020-05-12  Jakub Jelinek  <jakub@redhat.com>

	* omp-offload.h (omp_discover_implicit_declare_target): Declare.
	* omp-offload.c: Include context.h.
	(omp_declare_target_fn_p, omp_declare_target_var_p,
	omp_discover_declare_target_fn_r, omp_discover_declare_target_var_r,
	omp_discover_implicit_declare_target): New functions.
	* cgraphunit.c (analyze_functions): Call
	omp_discover_implicit_declare_target.

	* testsuite/libgomp.c/target-39.c: New test.
2020-05-12 09:17:09 +02:00
Tobias Burnus
c2211a60ff Fix OpenMP offload handling for target-link variables for nvptx (PR81689)
PR libgomp/81689
	* lto.c (offload_handle_link_vars): Propagate TREE_PUBLIC state.

	PR libgomp/81689
	* omp-offload.c (omp_finish_file): Fix target-link handling if
	targetm_common.have_named_sections is false.

	PR libgomp/81689
	* testsuite/libgomp.c/target-link-1.c: Remove xfail.
2020-03-24 15:13:56 +01:00
Jakub Jelinek
9c3cdb43c2 tree-nested: Fix handling of *reduction clauses with C array sections [PR93566]
tree-nested.c didn't handle C array sections in {,task_,in_}reduction clauses.

2020-03-14  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/93566
	* tree-nested.c (convert_nonlocal_omp_clauses,
	convert_local_omp_clauses): Handle {,in_,task_}reduction clauses
	with C/C++ array sections.

	* testsuite/libgomp.c/pr93566.c: New test.
2020-03-15 01:27:40 +01:00
Frederik Harwath
001ab12e62 openmp: ignore nowait if async execution is unsupported [PR93481]
An OpenMP "nowait" clause on a target construct currently leads to
a call to GOMP_OFFLOAD_async_run in the plugin that is used for
offloading at execution time. The nvptx plugin contains only a stub
of this function that always produces a fatal error if called.

This commit changes the "nowait" implementation to ignore the clause
if the executing device's plugin does not implement GOMP_OFFLOAD_async_run.
The stub in the nvptx plugin is removed which effectively means that
programs containing "nowait" can now be executed with nvptx offloading
as if the clause had not been used.
This behavior is consistent with the OpenMP specification which says that
"[...] execution of the target task *may* be deferred" (emphasis added),
cf. OpenMP 5.0, page 172.

libgomp/

	* plugin/plugin-nvptx.c: Remove GOMP_OFFLOAD_async_run stub.
	* target.c (gomp_load_plugin_for_device): Make "async_run" loading
	optional.
	(gomp_target_task_fn): Assert "devicep->async_run_func".
	(clear_unsupported_flags): New function to remove unsupported flags
	(right now only GOMP_TARGET_FLAG_NOWAIT) that can be be ignored.
	(GOMP_target_ext): Apply clear_unsupported_flags to flags.
	* testsuite/libgomp.c/target-33.c:
	Remove xfail for offload_target_nvptx.
	* testsuite/libgomp.c/target-34.c: Likewise.
2020-02-13 10:18:31 +01:00
Frederik Harwath
fd789c816b Add xfails to libgomp tests target-{33,34}.c, target-link-1.c
Add xfails for nvptx offloading because
"no GOMP_OFFLOAD_async_run implemented in plugin-nvptx.c"
(https://gcc.gnu.org/PR81688) and because
"omp target link not implemented for nvptx"
(https://gcc.gnu.org/PR81689).

libgomp/
	* testsuite/libgomp.c/target-33.c: Add xfail for execution on
	offload_target_nvptx, cf. https://gcc.gnu.org/PR81688.
	* testsuite/libgomp.c/target-34.c: Likewise.
	* testsuite/libgomp.c/target-link-1.c: Add xfail for
	offload_target_nvptx, cf. https://gcc.gnu.org/PR81689.
2020-02-10 09:16:46 +01:00
Jakub Jelinek
9bc3b95dfe openmp: Optimize DECL_IN_CONSTANT_POOL vars in target regions
DECL_IN_CONSTANT_POOL are shared and thus don't really get emitted in the
BLOCK where they are used, so for OpenMP target regions that have initializers
gimplified into copying from them we actually map them at runtime from host to
offload devices.  This patch instead marks them as "omp declare target", so
that they are on the target device from the beginning and don't need to be
copied there.

2020-02-09  Jakub Jelinek  <jakub@redhat.com>

	* gimplify.c (gimplify_adjust_omp_clauses_1): Promote
	DECL_IN_CONSTANT_POOL variables into "omp declare target" to avoid
	copying them around between host and target.

	* testsuite/libgomp.c/target-38.c: New test.
2020-02-09 08:17:10 +01:00
Jakub Jelinek
8d9254fc8a Update copyright years.
From-SVN: r279813
2020-01-01 12:51:42 +01:00
Jakub Jelinek
601399c0df re PR middle-end/86416 ([OpenMP] Offloading - better lto1 error message if mode not supported on offloading target)
PR middle-end/86416
	* testsuite/libgomp.c/pr86416-1.c (main): Use L suffixes rather than
	q or none.
	* testsuite/libgomp.c/pr86416-2.c (main): Use Q suffixes rather than
	L or none.

From-SVN: r279552
2019-12-19 00:27:28 +01:00
Tobias Burnus
c80c9e26de PR 86416 – improve lto1 diagnostic if a mode does not exist
PR middle-end/86416
        *  Makefile.in (CFLAGS-lto-streamer-in.o): Pass target_noncanonical on.
        * lto-streamer-in.c (lto_input_mode_table): Improve unsupported-mode
        diagnostic.

        PR middle-end/86416
        * testsuite/libgomp.c/pr86416-1.c: New.
        * testsuite/libgomp.c/pr86416-2.c: New.

From-SVN: r279528
2019-12-18 17:51:08 +01:00
Rainer Orth
b8e724465b Fix failures on Solaris with -fno-common default
gcc/testsuite:
	* gcc.c-torture/execute/20030913-1.c: Rename glob to g.
	* gcc.c-torture/execute/960218-1.c: Rename glob to gl.
	* gcc.c-torture/execute/complex-6.c: Rename err to e.
	* gcc.dg/torture/ssa-pta-fn-1.c: Rename glob to g.

	libgomp:
	* testsuite/libgomp.c/pr39591-1.c: Rename err to e.
	* testsuite/libgomp.c/pr39591-2.c: Likewise.
	* testsuite/libgomp.c/pr39591-3.c: Likewise.
	* testsuite/libgomp.c/private-1.c: Likewise.
	* testsuite/libgomp.c/task-1.c: Likewise.
	* testsuite/libgomp.c/task-5.c: Renamed err to serr.

From-SVN: r278571
2019-11-21 16:14:21 +00:00
Andrew Stubbs
8916ba874d Add tests for print from offload target.
2019-11-15  Andrew Stubbs  <ams@codesourcery.com>

	libgomp/
	* testsuite/libgomp.c/target-print-1.c: New file.
	* testsuite/libgomp.fortran/target-print-1.f90: New file.
	* testsuite/libgomp.oacc-c/print-1.c: New file.
	* testsuite/libgomp.oacc-fortran/print-1.f90: New file.

From-SVN: r278284
2019-11-15 10:49:10 +00:00
Jakub Jelinek
e62506f362 re PR libgomp/91530 (Several libgomp.*/scan-* tests FAIL without avx_runtime)
PR libgomp/91530
	* config/i386/sse.md (vec_shl_<mode>, vec_shr_<mode>): Use
	V_128 iterator instead of VI_128.

	* testsuite/libgomp.c/scan-21.c: New test.
	* testsuite/libgomp.c/scan-22.c: New test.

From-SVN: r274985
2019-08-28 12:13:21 +02:00
Jakub Jelinek
5cb72d83bb re PR libgomp/91530 (Several libgomp.*/scan-* tests FAIL without avx_runtime)
PR libgomp/91530
	* config/i386/sse.md (vec_shl_<mode>, vec_shr_<mode>): Use
	V_128 iterator instead of VI_128.

	* testsuite/libgomp.c/scan-21.c: New test.
	* testsuite/libgomp.c/scan-22.c: New test.

From-SVN: r274984
2019-08-28 12:12:11 +02:00
Jakub Jelinek
0ad7981cb4 re PR libgomp/91530 (Several libgomp.*/scan-* tests FAIL without avx_runtime)
PR libgomp/91530
	* testsuite/libgomp.c/scan-11.c: Add -msse2 option for sse2_runtime
	targets.
	* testsuite/libgomp.c/scan-12.c: Likewise.
	* testsuite/libgomp.c/scan-13.c: Likewise.
	* testsuite/libgomp.c/scan-14.c: Likewise.
	* testsuite/libgomp.c/scan-15.c: Likewise.
	* testsuite/libgomp.c/scan-16.c: Likewise.
	* testsuite/libgomp.c/scan-17.c: Likewise.
	* testsuite/libgomp.c/scan-18.c: Likewise.
	* testsuite/libgomp.c/scan-19.c: Likewise.
	* testsuite/libgomp.c/scan-20.c: Likewise.
	* testsuite/libgomp.c++/scan-9.C: Likewise.
	* testsuite/libgomp.c++/scan-10.C: Likewise.
	* testsuite/libgomp.c++/scan-11.C: Likewise.
	* testsuite/libgomp.c++/scan-12.C: Likewise.
	* testsuite/libgomp.c++/scan-14.C: Likewise.
	* testsuite/libgomp.c++/scan-15.C: Likewise.
	* testsuite/libgomp.c++/scan-13.C: Likewise.  Use sse2_runtime
	instead of i?86-*-* x86_64-*-* as target for scan-tree-dump-times.
	* testsuite/libgomp.c++/scan-16.C: Likewise.

From-SVN: r274947
2019-08-27 12:45:55 +02:00
Jakub Jelinek
8860d2706d gimplify.c (omp_add_variable): Use GOVD_PRIVATE | GOVD_EXPLICIT for VLA helper variables on target data even if...
* gimplify.c (omp_add_variable): Use GOVD_PRIVATE | GOVD_EXPLICIT
	for VLA helper variables on target data even if not GOVD_FIRSTPRIVATE.
	(gimplify_scan_omp_clauses): For OMP_CLAUSE_USE_DEVICE_* use just
	GOVD_EXPLICIT flags.
	(gimplify_omp_workshare): For OMP_TARGET_DATA move all
	OMP_CLAUSE_USE_DEVICE_* clauses to the end of clauses chain.
	* omp-low.c (scan_sharing_clauses): For OMP_CLAUSE_USE_DEVICE_*
	call install_var_field with mask 11 instead of 3.
	(lower_omp_target): For OMP_CLAUSE_USE_DEVICE_* use pass
	(splay_tree_key) &DECL_UID (var) to build_sender_ref instead of var.
gcc/c/
	* c-typeck.c (c_finish_omp_clauses): For C_ORT_OMP
	OMP_CLAUSE_USE_DEVICE_* clauses use oacc_reduction_head bitmap
	instead of generic_head to track duplicates.
gcc/cp/
	* semantics.c (finish_omp_clauses): For C_ORT_OMP
	OMP_CLAUSE_USE_DEVICE_* clauses use oacc_reduction_head bitmap
	instead of generic_head to track duplicates.
libgomp/
	* target.c (gomp_map_vars_internal): For GOMP_MAP_USE_DEVICE_PTR
	perform the lookup in the first loop only if !not_found_cnt, otherwise
	perform lookups for it in the second loop guarded with
	if (not_found_cnt || has_firstprivate).
	* testsuite/libgomp.c/target-37.c: New test.
	* testsuite/libgomp.c++/target-22.C: New test.

From-SVN: r274206
2019-08-08 08:39:02 +02:00
Jakub Jelinek
398e3feb8a tree-core.h (enum omp_clause_code): Adjust OMP_CLAUSE_USE_DEVICE_PTR OpenMP description.
* tree-core.h (enum omp_clause_code): Adjust OMP_CLAUSE_USE_DEVICE_PTR
	OpenMP description.  Add OMP_CLAUSE_USE_DEVICE_ADDR clause.
	* tree.c (omp_clause_num_ops, omp_clause_code_name): Add entries
	for OMP_CLAUSE_USE_DEVICE_ADDR clause.
	(walk_tree_1): Handle OMP_CLAUSE_USE_DEVICE_ADDR.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree-nested.c (convert_nonlocal_omp_clauses,
	convert_local_omp_clauses): Likewise.
	* gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses):
	Likewise.
	* omp-low.c (scan_sharing_clauses, lower_omp_target): Likewise.
	Treat OMP_CLAUSE_USE_DEVICE_ADDR like OMP_CLAUSE_USE_DEVICE_PTR
	clause with array or reference to array types, no matter what type
	except for reference it has.
gcc/c-family/
	* c-pragma.h (enum pragma_omp_clause): Add
	PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.  Set PRAGMA_OACC_CLAUSE_USE_DEVICE
	equal to PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR instead of being a separate
	enumeration value.
gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Parse use_device_addr clause.
	(c_parser_omp_clause_use_device_addr): New function.
	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.
	(OMP_TARGET_DATA_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.
	(c_parser_omp_target_data): Handle PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR
	like PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR, adjust diagnostics about no
	map or use_device_* clauses.
	* c-typeck.c (c_finish_omp_clauses): For OMP_CLAUSE_USE_DEVICE_PTR
	in OpenMP, require pointer type rather than pointer or array type.
	Handle OMP_CLAUSE_USE_DEVICE_ADDR.
gcc/cp/
	* parser.c (cp_parser_omp_clause_name): Parse use_device_addr clause.
	(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.
	(OMP_TARGET_DATA_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR.
	(cp_parser_omp_target_data): Handle PRAGMA_OMP_CLAUSE_USE_DEVICE_ADDR
	like PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR, adjust diagnostics about no
	map or use_device_* clauses.
	* semantics.c (finish_omp_clauses): For OMP_CLAUSE_USE_DEVICE_PTR
	in OpenMP, require pointer or reference to pointer type rather than
	pointer or array or reference to pointer or array type. Handle
	OMP_CLAUSE_USE_DEVICE_ADDR.
	* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_USE_DEVICE_ADDR.
gcc/testsuite/
	* c-c++-common/gomp/target-data-1.c (foo): Use use_device_addr clause
	instead of use_device_ptr clause where required by OpenMP 5.0, add
	further tests for both use_device_ptr and use_device_addr clauses.
libgomp/
	* testsuite/libgomp.c/target-18.c (struct S): New type.
	(foo): Use use_device_addr clause instead of use_device_ptr clause
	where required by OpenMP 5.0, add further tests for both use_device_ptr
	and use_device_addr clauses.
	* testsuite/libgomp.c++/target-9.C (struct S): New type.
	(foo): Use use_device_addr clause instead of use_device_ptr clause
	where required by OpenMP 5.0, add further tests for both use_device_ptr
	and use_device_addr clauses.  Add t and u arguments.
	(main): Adjust caller.

From-SVN: r274159
2019-08-07 09:27:10 +02:00
Jakub Jelinek
6f67abcdb0 omp-low.c (lower_rec_input_clauses): For lastprivate clauses in ctx->for_simd_scan_phase simd copy the outer var to...
* omp-low.c (lower_rec_input_clauses): For lastprivate clauses in
	ctx->for_simd_scan_phase simd copy the outer var to the privatized
	variable(s).  For conditional lastprivate look through outer
	GIMPLE_OMP_SCAN context.
	(lower_omp_1): For conditional lastprivate look through outer
	GIMPLE_OMP_SCAN context.

	* testsuite/libgomp.c/scan-19.c: New test.
	* testsuite/libgomp.c/scan-20.c: New test.

From-SVN: r273169
2019-07-06 23:58:01 +02:00
Jakub Jelinek
1f52d1a8b5 omp-low.c (struct omp_context): Add for_simd_scan_phase member.
* omp-low.c (struct omp_context): Add for_simd_scan_phase member.
	(maybe_lookup_ctx): Add forward declaration.
	(omp_find_scan): Likewise.  Walk into body of simd if composited
	with worksharing loop.
	(scan_omp_simd_scan): New function.
	(scan_omp_1_stmt): Call it.
	(lower_rec_simd_input_clauses): Don't create rvar nor rvar2 if
	ctx->for_simd_scan_phase.
	(lower_rec_input_clauses): Do much less work for inscan reductions
	in ctx->for_simd_scan_phase is_simd regions.
	(lower_omp_scan): Set is_simd also on simd constructs composited
	with worksharing loop, unless ctx->for_simd_scan_phase.  Never emit
	a sorry message.  Don't change GIMPLE_OMP_SCAN stmts into nops and
	emit their body after in simd constructs composited with worksharing
	loop.
	(lower_omp_for_scan): Handle worksharing loop composited with simd.

	* c-c++-common/gomp/scan-4.c: Don't expect sorry message.

	* testsuite/libgomp.c/scan-11.c: New test.
	* testsuite/libgomp.c/scan-12.c: New test.
	* testsuite/libgomp.c/scan-13.c: New test.
	* testsuite/libgomp.c/scan-14.c: New test.
	* testsuite/libgomp.c/scan-15.c: New test.
	* testsuite/libgomp.c/scan-16.c: New test.
	* testsuite/libgomp.c/scan-17.c: New test.
	* testsuite/libgomp.c/scan-18.c: New test.
	* testsuite/libgomp.c++/scan-9.C: New test.
	* testsuite/libgomp.c++/scan-10.C: New test.
	* testsuite/libgomp.c++/scan-11.C: New test.
	* testsuite/libgomp.c++/scan-12.C: New test.
	* testsuite/libgomp.c++/scan-13.C: New test.
	* testsuite/libgomp.c++/scan-14.C: New test.
	* testsuite/libgomp.c++/scan-15.C: New test.
	* testsuite/libgomp.c++/scan-16.C: New test.

From-SVN: r273157
2019-07-06 09:53:48 +02:00
Jakub Jelinek
2f03073ff2 omp-expand.c (expand_omp_for_static_nochunk): Don't emit GOMP_loop_start at the start of second worksharing loop in a scan.
* omp-expand.c (expand_omp_for_static_nochunk): Don't emit
	GOMP_loop_start at the start of second worksharing loop in a scan.
	For nowait, don't emit GOMP_loop_end_nowait at the end of first
	worksharing loop in a scan even if there are conditional lastprivates,
	and do emit GOMP_loop_end_nowait at the end of second worksharing loop.

	* testsuite/libgomp.c/scan-9.c: New test.
	* testsuite/libgomp.c/scan-10.c: New test.

From-SVN: r273095
2019-07-04 23:40:56 +02:00
Jakub Jelinek
2f6bb511d1 tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SCANTEMP_ clause.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SCANTEMP_
	clause.
	* tree.h (OMP_CLAUSE_DECL): Use OMP_CLAUSE__SCANTEMP_ instead of
	OMP_CLAUSE__CONDTEMP_ as range's upper bound.
	(OMP_CLAUSE__SCANTEMP__ALLOC, OMP_CLAUSE__SCANTEMP__CONTROL): Define.
	* tree.c (omp_clause_num_ops, omp_clause_code_name): Add
	OMP_CLAUSE__SCANTEMP_ entry.
	(walk_tree_1): Handle OMP_CLAUSE__SCANTEMP_.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree-nested.c (convert_nonlocal_omp_clauses,
	convert_local_omp_clauses): Likewise.
	* omp-general.h (struct omp_for_data): Add have_scantemp and
	have_nonctrl_scantemp members.
	* omp-general.c (omp_extract_for_data): Initialize them.
	* omp-low.c (struct omp_context): Add scan_exclusive member.
	(scan_omp_1_stmt): Don't unnecessarily mask gimple_omp_for_kind
	result again with GF_OMP_FOR_KIND_MASK.  Initialize also
	ctx->scan_exclusive.
	(lower_rec_simd_input_clauses): Use ctx->scan_exclusive instead
	of !ctx->scan_inclusive.
	(lower_rec_input_clauses): Simplify gimplification of dtors using
	gimplify_and_add.  For non-is_simd test OMP_CLAUSE_REDUCTION_INSCAN
	rather than rvarp.  Handle OMP_CLAUSE_REDUCTION_INSCAN in worksharing
	loops.  Don't add barrier for reduction_omp_orig_ref if
	ctx->scan_??xclusive.
	(lower_reduction_clauses): Don't do anything for ctx->scan_??xclusive.
	(lower_omp_scan): Use ctx->scan_exclusive instead
	of !ctx->scan_inclusive.  Handle worksharing loops with inscan
	reductions.  Use new_vard != new_var instead of repeated
	omp_is_reference calls.
	(omp_find_scan, lower_omp_for_scan): New functions.
	(lower_omp_for): Call lower_omp_for_scan for worksharing loops with
	inscan reductions.
	* omp-expand.c (expand_omp_scantemp_alloc): New function.
	(expand_omp_for_static_nochunk): Handle fd->have_nonctrl_scantemp
	and fd->have_scantemp.

	* c-c++-common/gomp/scan-3.c (f1): Don't expect a sorry message.
	* c-c++-common/gomp/scan-5.c (foo): Likewise.

	* testsuite/libgomp.c++/scan-1.C: New test.
	* testsuite/libgomp.c++/scan-2.C: New test.
	* testsuite/libgomp.c++/scan-3.C: New test.
	* testsuite/libgomp.c++/scan-4.C: New test.
	* testsuite/libgomp.c++/scan-5.C: New test.
	* testsuite/libgomp.c++/scan-6.C: New test.
	* testsuite/libgomp.c++/scan-7.C: New test.
	* testsuite/libgomp.c++/scan-8.C: New test.
	* testsuite/libgomp.c/scan-1.c: New test.
	* testsuite/libgomp.c/scan-2.c: New test.
	* testsuite/libgomp.c/scan-3.c: New test.
	* testsuite/libgomp.c/scan-4.c: New test.
	* testsuite/libgomp.c/scan-5.c: New test.
	* testsuite/libgomp.c/scan-6.c: New test.
	* testsuite/libgomp.c/scan-7.c: New test.
	* testsuite/libgomp.c/scan-8.c: New test.

From-SVN: r272958
2019-07-03 07:03:58 +02:00
Jakub Jelinek
211b7533bf re PR middle-end/90779 (Fortran array initialization in offload regions)
PR middle-end/90779
	* gimplify.c: Include omp-offload.h and context.h.
	(gimplify_bind_expr): Add "omp declare target" attributes
	to static block scope variables inside of target region or target
	functions.

	* c-c++-common/goacc/routine-5.c (func2): Don't expect error for
	static block scope variable in #pragma acc routine.

	* testsuite/libgomp.c/pr90779.c: New test.
	* testsuite/libgomp.fortran/pr90779.f90: New test.

From-SVN: r272322
2019-06-15 09:09:04 +02:00