Commit Graph

187227 Commits

Author SHA1 Message Date
Julian Brown e2a58ed6dc openacc: Middle-end worker-partitioning support
This patch implements worker-partitioning support in the middle end,
by rewriting gimple. The OpenACC execution model requires that code
can run in either "worker single" mode where only a single worker per
gang is active, or "worker partitioned" mode, where multiple workers
per gang are active. This means we need to do something equivalent
to spawning additional workers when transitioning from worker-single
to worker-partitioned mode. However, GPUs typically fix the number of
threads of invoked kernels at launch time, so we need to do something
with the "extra" threads when they are not wanted.

The scheme used is to conditionalise each basic block that executes
in "worker single" mode for worker 0 only. Conditional branches
are handled specially so "idle" (non-0) workers follow along with
worker 0. On transitioning to "worker partitioned" mode, any variables
modified by worker 0 are propagated to the other workers via GPU shared
memory. Special care is taken for routine calls, writes through pointers,
and so forth, as follows:

  - There are two types of function calls to consider in worker-single
    mode: "normal" calls to maths library routines, etc. are called from
    worker 0 only. OpenACC routines may contain worker-partitioned loops
    themselves, so are called from all workers, including "idle" ones.

  - SSA names set in worker-single mode, but used in worker-partitioned
    mode, are copied to shared memory in worker 0. Other workers retrieve
    the value from the appropriate shared-memory location after a barrier,
    and new phi nodes are introduced at the convergence point to resolve
    the worker 0/other worker copies of the value.

  - Local scalar variables (on the stack) also need special handling. We
    broadcast any variables that are written in the current worker-single
    block, and that are read in any worker-partitioned block.  (This is
    believed to be safe, and is flow-insensitive to ease analysis.)

  - Local aggregates (arrays and composites) on the stack are *not*
    broadcast. Instead we force gimple stmts modifying elements/fields of
    local aggregates into fully-partitioned mode. The RHS of the
    assignment is a scalar, and is thus subject to broadcasting as above.

  - Writes through pointers may affect any local variable that has
    its address taken. We use points-to analysis to determine the set
    of potentially-affected variables for a given pointer indirection.
    We broadcast any such variable which is used in worker-partitioned
    mode, on a per-block basis for any block containing a write through
    a pointer.

Some slides about the implementation (from 2018) are available at:

  https://jtb20.github.io/gcnworkers.pdf

	gcc/
	* Makefile.in (OBJS): Add omp-oacc-neuter-broadcast.o.
	* doc/tm.texi.in (TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD):
	Add documentation hook.
	* doc/tm.texi: Regenerate.
	* omp-oacc-neuter-broadcast.cc: New file.
	* omp-builtins.def (BUILT_IN_GOACC_BARRIER)
	(BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START)
	(BUILT_IN_GOACC_SINGLE_COPY_END): New builtins.
	* passes.def (pass_omp_oacc_neuter_broadcast): Add pass.
	* target.def (goacc.create_worker_broadcast_record): Add target
	hook.
	* tree-pass.h (make_pass_omp_oacc_neuter_broadcast): Add
	prototype.
	* config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record):
	Rename prototype to...
	(gcn_goacc_create_worker_broadcast_record): ... this.
	* config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename
	function to...
	(gcn_goacc_create_worker_broadcast_record): ... this.
	* config/gcn/gcn.c (TARGET_GOACC_ADJUST_PROPAGATION_RECORD):
	Rename to...
	(TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD): ... this.

Co-Authored-By: Nathan Sidwell <nathan@codesourcery.com> (via 'gcc/config/nvptx/nvptx.c' master)
Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-08-09 14:47:42 +02:00
Tejas Belagod e2e0b85c1e PR101609: Use the correct iterator for AArch64 vector right shift pattern
Loops containing long long shifts fail to vectorize due to the vectorizer
not being able to recognize long long right shifts. This is due to a bug
in the iterator used for the vashr and vlshr patterns in aarch64-simd.md.

2021-08-09  Tejas Belagod  <tejas.belagod@arm.com>

gcc/ChangeLog
	PR target/101609
	* config/aarch64/aarch64-simd.md (vlshr<mode>3, vashr<mode>3): Use
	the right iterator.

gcc/testsuite/ChangeLog
	* gcc.target/aarch64/vect-shr-reg.c: New testcase.
	* gcc.target/aarch64/vect-shr-reg-run.c: Likewise.
2021-08-09 12:54:14 +01:00
Thomas Schwinge 0095afa82a Remove 'gcc/omp-offload.c' from 'GTFILES'
Given that it doesn't contain any 'GTY' markers, no 'gcc/gt-omp-offload.h' file
gets generated (and '#include'd anywhere).

Small fix-up for r243673 (Git commit 629b3d75c8)
"Split omp-low into multiple files".

	gcc/
	* Makefile.in (GTFILES): Remove '$(srcdir)/omp-offload.c'.
2021-08-09 13:40:54 +02:00
Thomas Schwinge 2a700fb8ea Don't consider '-foffload-abi' in 'DEF_GOACC_BUILTIN', 'DEF_GOMP_BUILTIN'
Since Tom's PR64707 commit r220037 (Git commit
1506ae0e1e) "Make fopenmp an LTO option" as well
as PR64672 commit r220038 (Git commit a0c88d0629)
"Make fopenacc an LTO option", we're now actually passing
'-fopenacc'/'-fopenmp' to the 'mkoffload's, which will pass these on to the
offload compilers.

	gcc/
	* builtins.def (DEF_GOACC_BUILTIN, DEF_GOMP_BUILTIN): Don't
	consider '-foffload-abi'.
	* common.opt (-foffload-abi): Remove 'Var', 'Init'.
	* opts.c (common_handle_option) <-foffload-abi> [ACCEL_COMPILER]:
	Ignore.
2021-08-09 13:39:38 +02:00
Thomas Schwinge c523051930 Sanity check that 'Init' doesn't appear without 'Var' in '*.opt' files
... as that doesn't make sense.

    @item Init(@var{value})
    The variable specified by the @code{Var} property should be statically
    initialized to @var{value}.  [...]

	gcc/
	* optc-gen.awk: Sanity check that 'Init' doesn't appear without
	'Var'.
2021-08-09 13:38:14 +02:00
Thomas Schwinge 06870af3e4 [OpenACC] Clean up unused 'BUILT_IN_ACC_GET_DEVICE_TYPE'
Unused as of r229767 (Git commit e50146711b)
"OpenACC reductions".

	gcc/
	* omp-builtins.def (BUILT_IN_ACC_GET_DEVICE_TYPE): Remove.
2021-08-09 13:36:19 +02:00
Thomas Schwinge 7cc85851bc [documentation] No need anymore to "mention ['gt-*.h' file] as a dependency in the 'Makefile'"
... as of r202907 (Git commit b6541edc52)
"remove explicit dependencies".

	gcc/
	* doc/gty.texi (Files): Update.
2021-08-09 13:28:10 +02:00
Thomas Schwinge 67b8443bd1 [documentation] Fix GTY header file example
Fix-up for CVS 'gcc/doc/gty.texi' r1.6 (Subversion r55857, Git
commit cba57c9d40) "Minor doc updates"

	gcc/
	* doc/gty.texi (Files): Fix GTY header file example.
2021-08-09 13:27:54 +02:00
Roger Sayle 848bcda52d Improve handling of unknown sign bit in CCP.
This middle-end patch implements several related improvements to
tree-ssa's conditional (bit) constant propagation pass.  The current
code handling ordered comparisons contains the comment "If the
most significant bits are not known we know nothing" which is not
entirely true [this test even prevents this pass understanding these
comparisons always have a zero or one result].  This patch introduces
a new value_mask_to_min_max helper function, that understands the
different semantics of the most significant bit on signed vs.
unsigned values.  This allows us to generalize ordered comparisons,
GE_EXPR, GT_EXPR, LE_EXPR and LT_EXPR, where to code is tweaked to
correctly handle the potential equal cases.  Then finally support
is added for the related tree codes MIN_EXPR, MAX_EXPR, ABS_EXPR
and ABSU_EXPR.

Regression testing revealed three test cases in the testsuite that
were checking for specific optimizations that are now being performed
earlier than expected.  These tests can continue to check their
original transformations by explicitly adding -fno-tree-ccp to their
dg-options (some already specify -fno-ipa-vrp or -fno-tree-forwprop
for the same reason).

2021-08-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-ccp.c (value_mask_to_min_max): Helper function to
	determine the upper and lower bounds from a mask-value pair.
	(bit_value_unop) [ABS_EXPR, ABSU_EXPR]: Add support for
	absolute value and unsigned absolute value expressions.
	(bit_value_binop):  Initialize *VAL's precision.
	[LT_EXPR, LE_EXPR]: Use value_mask_to_min_max to determine
	upper and lower bounds of operands.  Add LE_EXPR/GE_EXPR
	support when the operands are unknown but potentially equal.
	[MIN_EXPR, MAX_EXPR]: Support minimum/maximum expressions.

gcc/testsuite/ChangeLog
	* gcc.dg/pr68217.c: Add -fno-tree-ccp option.
	* gcc.dg/tree-ssa/vrp24.c: Add -fno-tree-ccp option.
	* g++.dg/ipa/pure-const-3.C: Add -fno-tree-ccp option.
2021-08-09 12:02:53 +01:00
Jonathan Wakely 2eff2a3cb5 libstdc++: Make allocator equality comparable in tests
libstdc++-v3/ChangeLog:

	* testsuite/23_containers/unordered_map/cons/default.cc: Add
	equality comparison operators to allocator.
	* testsuite/23_containers/unordered_set/cons/default.cc:
	Likewise.
2021-08-09 11:43:50 +01:00
Tobias Burnus 527a1cf32c testsuite/lib/gfortran.exp: Add -I for ISO*.h [PR101305, PR101660]
This patch adds -I$specdir/libgfortran to GFORTRAN_UNDER_TEST, when
set by proc gfortran_init. As the $specdir depends on the multilib
setting, it has to be re-set for a different multilib; hence, we track
whether a previous call to gfortran_init set that var or whether it
was set differently.

gcc/testsuite/
	PR libfortran/101305
	PR fortran/101660

	* lib/gfortran.exp (gfortran_init): Add -I $specdir/libgfortran to
	GFORTRAN_UNDER_TEST; update it when set by previous gfortran_init call.
	* gfortran.dg/ISO_Fortran_binding_1.c: Use <...> not "..." for
	ISO_Fortran_binding.h's #include.
	* gfortran.dg/ISO_Fortran_binding_10.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_11.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_12.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_15.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_16.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_17.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_18.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_3.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_5.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_6.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_7.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_8.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_9.c: Likewise.
	* gfortran.dg/PR94327.c: Likewise.
	* gfortran.dg/PR94331.c: Likewise.
	* gfortran.dg/bind_c_array_params_3_aux.c: Likewise.
	* gfortran.dg/iso_fortran_binding_uint8_array_driver.c: Likewise.
	* gfortran.dg/pr93524.c: Likewise.
2021-08-09 12:35:23 +02:00
Bin Cheng a5e78ee60c aarch64: Expand %<w> correctly according to mode iterator
Pattern "*extend<SHORT:mode><GPI:mode>2_aarch64" is duplicated
from the corresponding zero_extend pattern, however %<w> needs
to be expanded according to its mode iterator because the smov
instruction is different to umov.

2021-08-09  Bin Cheng  <bin.cheng@linux.alibaba.com>

gcc/
	* config/aarch64/aarch64.md
	(*extend<SHORT:mode><GPI:mode>2_aarch64): Use %<GPI:w>0.
2021-08-09 17:21:03 +08:00
Jonathan Wright a5e3c1e2c8 testsuite: aarch64: Fix invalid SVE tests
Some scan-assembler tests for SVE code generation were erroneously
split over multiple lines - meaning they became invalid. This patch
gets the tests working again by putting each test on a single line.

The extract_[1234].c tests are corrected to expect that extracted
32-bit values are moved into 'w' registers rather than 'x' registers.

gcc/testsuite/ChangeLog:

2021-08-06  Jonathan Wright  <jonathan.wright@arm.com>

	* gcc.target/aarch64/sve/dup_lane_1.c: Don't split
	scan-assembler tests over multiple lines. Expect 32-bit
	result values in 'w' registers.
	* gcc.target/aarch64/sve/extract_1.c: Likewise.
	* gcc.target/aarch64/sve/extract_2.c: Likewise.
	* gcc.target/aarch64/sve/extract_3.c: Likewise.
	* gcc.target/aarch64/sve/extract_4.c: Likewise.
2021-08-09 09:59:05 +01:00
Jonathan Wright da81e30d21 testsuite: aarch64: Fix failing vector structure tests on big-endian
Recent refactoring of the arm_neon.h header enabled better code
generation for intrinsics that manipulate vector structures. New
tests were also added to verify the benefit of these changes. It now
transpires that the code generation improvements are observed only on
little-endian systems. This patch restricts the code generation tests
to little-endian targets.

gcc/testsuite/ChangeLog:

2021-08-04  Jonathan Wright  <jonathan.wright@arm.com>

	* gcc.target/aarch64/vector_structure_intrinsics.c: Restrict
	tests to little-endian targets.
2021-08-09 09:58:43 +01:00
Hongyu Wang 78be906b26 MAINTAINERS: Add myself for write after approval
ChangeLog:

	* MAINTAINERS (Write After Approval): Add myself.
2021-08-09 09:59:44 +08:00
GCC Administrator 844105d912 Daily bump. 2021-08-09 00:16:32 +00:00
Sergei Trofimovich 5f564fd013 lra: Fix s/otput/output/ typo in debug output
gcc/
	* lra-constraints.c: Fix s/otput/output/ typo.
2021-08-08 21:37:20 +01:00
François Dumont ad9c394114 libstdc++: Fix dg-prune-output assertion message
Since __glibcxx_assert changes in r6b42b5a the generated assertion message
has changed.

libstdc++-v3/ChangeLog:

	* testsuite/25_algorithms/copy/debug/constexpr_neg.cc: Replace 'failed_assertion'
	dg-prune-output reason with 'builtin_unreachable'.
	* testsuite/25_algorithms/copy_backward/debug/constexpr_neg.cc: Likewise.
	* testsuite/25_algorithms/equal/debug/constexpr_neg.cc: Likewise.
	* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_neg.cc: Likewise.
	* testsuite/25_algorithms/lower_bound/debug/constexpr_partitioned_pred_neg.cc: Likewise.
	* testsuite/25_algorithms/lower_bound/debug/constexpr_valid_range_neg.cc: Likewise.
	* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_neg.cc: Likewise.
	* testsuite/25_algorithms/upper_bound/debug/constexpr_partitioned_pred_neg.cc: Likewise.
	* testsuite/25_algorithms/upper_bound/debug/constexpr_valid_range_neg.cc: Likewise.
2021-08-08 19:12:22 +02:00
Jeff Law fd26ce8398 Fix c6x test compromised by recent improvements to bswap & rotates
gcc/testsuite
	* gcc.target/tic6x/rotdi16-scan.c: Pull rotate into its own function.
2021-08-08 11:20:41 -04:00
Hans-Peter Nilsson e9b639c4b5 libstdc++: Tweak timeout for testsuite/std/ranges/iota/max_size_type.cc
A simulator can easily spend more than 10 minutes running
this test-case, and the default timeout is at 5 minutes.
Better allow even slower machines; use 4 as the factor.

Regarding relative runtime numbers (very local; mmixware simulator for
mmix-knuth-mmixware): test01 and test05 finish momentarily; test02 at
about 2 minutes, and test03 about 2m30, but test04 itself runs for
more than 6 minues and so times out.

Not sure if it's better to split up this test, as the excessive
runtime may be unintended, but this seemed simplest.

libstdc++-v3:
	* testsuite/std/ranges/iota/max_size_type.cc: Set
	dg-timeout-factor to 4.
2021-08-08 10:52:50 +02:00
GCC Administrator 7b51202c2a Daily bump. 2021-08-08 00:16:32 +00:00
Ian Lance Taylor 307e0d4036 compiler: support export/import of unsafe.Add/Slice
For golang/go#19367
For golang/go#40481

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/340549
2021-08-07 13:37:58 -07:00
Harald Anlauf cd754efa9a Fortran: ICE with automatic character object, save, and various options
gcc/fortran/ChangeLog:

	PR fortran/68568
	* primary.c (gfc_expr_attr): Variable attribute can only be
	inquired when symtree is non-NULL.
2021-08-07 20:30:32 +02:00
H.J. Lu 6866f4819a Add tests for PR tree-optimization/88531
PR tree-optimization/88531
	* gcc.target/i386/pr88531-1a.c: New test.
	* gcc.target/i386/pr88531-1b.c: Likewise.
	* gcc.target/i386/pr88531-1c.c: Likewise.
	* gcc.target/i386/pr88531-2a.c: Likewise.
	* gcc.target/i386/pr88531-2b.c: Likewise.
	* gcc.target/i386/pr88531-2c.c: Likewise.
2021-08-07 07:34:44 -07:00
GCC Administrator f92f477852 Daily bump. 2021-08-07 00:16:39 +00:00
Martin Sebor 81d6cdd335 Move more code to new gimple-ssa-warn-access pass.
gcc/ChangeLog:

	* builtins.c (expand_builtin_memchr): Move to gimple-ssa-warn-access.cc.
	(expand_builtin_strcat): Same.
	(expand_builtin_stpncpy): Same.
	(expand_builtin_strncat): Same.
	(check_read_access): Same.
	(check_memop_access): Same.
	(expand_builtin_strlen): Move checks to gimple-ssa-warn-access.cc.
	(expand_builtin_strnlen): Same.
	(expand_builtin_memcpy): Same.
	(expand_builtin_memmove): Same.
	(expand_builtin_mempcpy): Same.
	(expand_builtin_strcpy): Same.
	(expand_builtin_strcpy_args): Same.
	(expand_builtin_stpcpy_1): Same.
	(expand_builtin_strncpy): Same.
	(expand_builtin_memset): Same.
	(expand_builtin_bzero): Same.
	(expand_builtin_strcmp): Same.
	(expand_builtin_strncmp): Same.
	(expand_builtin): Remove handlers.
	(fold_builtin_strlen): Add a comment.
	* builtins.h (check_access): Move to gimple-ssa-warn-access.cc.
	* calls.c (maybe_warn_nonstring_arg): Same.
	* diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Add warning option.
	* gimple-fold.c (gimple_fold_builtin_strcpy): Pass argument to callee.
	(gimple_fold_builtin_stpcpy): Same.
	* gimple-ssa-warn-access.cc (has_location): New function.
	(get_location): Same.
	(get_callee_fndecl): Same.
	(call_nargs): Same.
	(call_arg): Same.
	(warn_string_no_nul): Define.
	(unterminated_array): Same.
	(check_nul_terminated_array): Same.
	(maybe_warn_nonstring_arg): Same.
	(maybe_warn_for_bound): Same.
	(warn_for_access): Same.
	(check_access): Same.
	(check_memop_access): Same.
	(check_read_access): Same.
	(warn_dealloc_offset): Use helper functions.
	(maybe_emit_free_warning): Same.
	(class pass_waccess): Add members.
	(check_strcat): New function.
	(check_strncat): New function.
	(check_stxcpy): New function.
	(check_stxncpy): New function.
	(check_strncmp): New function.
	(pass_waccess::check_builtin): New function.
	(pass_waccess::check): Call it.
	* gimple-ssa-warn-access.h (warn_string_no_nul): Move here from
	builtins.h.
	(maybe_warn_for_bound): Same.
	(check_access): Same.
	(check_memop_access): Same.
	(check_read_access): Same.
	* pointer-query.h (struct access_data): Define a ctor overload.

gcc/testsuite/ChangeLog:

	* c-c++-common/Wsizeof-pointer-memaccess1.c: Also disable
	-Wstringop-overread.
	* c-c++-common/attr-nonstring-3.c: Adjust pattern of expected message.
	* gcc.dg/Warray-bounds-39.c: Add an xfail due to a known bug.
	* gcc.dg/Wstring-compare-3.c: Also disable -Wstringop-overread.
	* gcc.dg/attr-nonstring-2.c: Adjust pattern of expected message.
	* gcc.dg/attr-nonstring-4.c: Same.
	* gcc.dg/Wstringop-overread-6.c: New test.
	* gcc.dg/sso-14.c: Fix typos to avoid buffer overflow.
2021-08-06 16:08:36 -06:00
Cherry Mui 629b5699fb compiler: make escape analysis more strict about runtime calls
Following the previous CL, in the escape analysis list all the
expected runtime calls, and fail if an unexpected one is seen.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/340397
2021-08-06 12:37:48 -07:00
Christophe Lyon aff75af3b5 arm: Fix pr69245.c testcase for reorder assembler architecture directives [PR101723]
In gcc.target/arm/pr69245.c, to have a .fpu neon-vfpv4 directive, make
sure code for fn1() is emitted, by removing the static keyword.

Fix a typo in gcc.target/arm/pr69245.c, where \s should be \\s.

2021-08-06  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/testsuite/

	PR target/101723
	* gcc.target/arm/pr69245.c: Make sure to emit code for fn1, fix
	typo.
2021-08-06 14:25:47 +00:00
Christophe Lyon a22b3e022c arm: Fix typos for reorder assembler architecture directives [PR101723]
Two tests had typos preventing them from passing, committed as obvious.

2021-08-06  Christophe Lyon  <christophe.lyon@foss.st.com>

	gcc/testsuite/
	PR target/101723
	* gcc.target/arm/attr-neon3.c: Fix typo.
	* gcc.target/arm/pragma_fpu_attribute_2.c: Fix typo.
2021-08-06 14:06:44 +00:00
Richard Biener f31da42e04 tree-optimization/101801 - remove vect_worthwhile_without_simd_p
This removes the cost part of vect_worthwhile_without_simd_p, retaining
only the correctness bits.  The reason is that the cost heuristic
do not properly account for SLP plus the check whether "without simd"
applies misfires for AVX512 mask vectors at the moment, leading to
missed vectorizations there.

Any costing decision should take place in the cost modeling, no
single stmt is to disable all vectorization on its own.

2021-08-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/101801
	* tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename...
	(vect_can_vectorize_without_simd_p): ... to this.
	* tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename...
	(vect_can_vectorize_without_simd_p): ... to this and fold
	in vect_min_worthwhile_factor.
	(vect_min_worthwhile_factor): Remove.
	(vectorizable_reduction): Adjust and remove the cost part.
	* tree-vect-stmts.c (vectorizable_shift): Likewise.
	(vectorizable_operation): Likewise.
2021-08-06 15:32:30 +02:00
Jonathan Wakely c2a984a357 libstdc++: Also move the [[nodiscard]] attributes in <compare>
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	* libsupc++/compare (compare_three_way, strong_order)
	(weak_order, partial_order, compare_strong_order_fallback)
	(compare_weak_order_fallback, compare_partial_order_fallback):
	Move nodiscard attributes to correct location.
2021-08-06 13:43:26 +01:00
Uros Bizjak cd04e829c3 i386: Fix conditional move reg-to-reg move elimination peepholes [PR101797]
Add missing operand predicate, otherwise any RTX will match.

2021-08-06  Uroš Bizjak  <ubizjak@gmail.com>

gcc/
	PR target/101797
	* config/i386/i386.md (cmove reg-to-reg move elimination peephole2s):
	Add general_gr_operand predicate to operand 3.

gcc/testsuite/
	PR target/101797
	* gcc.target/i386/pr101797.c: New test.
2021-08-06 14:22:15 +02:00
Roger Sayle 9d8eacc2ae Use CFN_BUILT_IN_CLRSB instead of BUILT_IN_CLRSB in switch.
This patch replaces the use of BUILT_IN_CLRSB with CFN_BUILT_IN_CLRSB
in my recent patch to tree-ssa-phiopt.c.  Both of these have identical
values, so there's no change in behavior, but consistent use of the same
enumeration avoids warnings when using clang (or static analysis tools).

2021-08-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* tree-ssa-phiopt.c (cond_removal_in_builtin_zero_pattern): Use
	CFN_BUILT_IN_CLRSB* instead of BUILT_IN_CLRSB* for consistency.
2021-08-06 12:30:53 +01:00
Tamar Christina 6b0bde7eef middle-end/AArch64: Fix bootstrap after vec changes
The build is broken since a3d3e8c362 since it's deleted the ability to pass
vec<> by value and now must be past by reference.

However some language hooks used by AArch64 were not updated and breaks the
build on AArch64.  This patch updates these hooks.

gcc/c/ChangeLog:

	* c-decl.c (c_simulate_enum_decl): Pass vec<> by pointer.
	* c-tree.h (c_simulate_enum_decl): Likewise.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve-builtins.cc (register_svpattern,
	register_svprfop): Pass vec<> by pointer.
	* langhooks-def.h (lhd_simulate_enum_decl): Likewise.
	* langhooks.c (lhd_simulate_enum_decl): Likewise.
	* langhooks.h (struct lang_hooks_for_types): Likewise.

gcc/cp/ChangeLog:

	* cp-objcp-common.h (cxx_simulate_enum_decl): Pass vec<> by pointer.
	* decl.c (cxx_simulate_enum_decl): Likewise.
2021-08-06 12:21:05 +01:00
Sebastian Huber 3c94db20be gcov: Remove <stdint.h> from libgcov-driver.c
In the patch to add __gcov_info_to_gcda(), the include of <stdint.h> was added
to libgcov-driver.c even if inhibit_libc is defined.  It turned out that this
header file is not always available.  Remove the include of <stdint.h> and
replace the intptr_t with the compiler provided __INTPTR_TYPE__.

libgcc/

	* libgcov-driver.c (#include <stdint.h>): Remove.
	(write_topn_counters): Use __INTPTR_TYPE__ instead of intptr_t.
2021-08-06 12:27:48 +02:00
Jonathan Wright bc181adf26 aarch64: Use memcpy to copy structures in bfloat vst* intrinsics
Use __builtin_memcpy to copy vector structures instead of using a
union - or constructing a new opaque structure one vector at a time -
in each of the vst[234][q] and vst1[q]_x[234] bfloat Neon intrinsics
in arm_neon.h.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst[234]q or vst1q_x[234]
bfloat intrinsics.

gcc/ChangeLog:

2021-07-30  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (vst1_bf16_x2): Use
	__builtin_memcpy instead of constructing an additional
	__builtin_aarch64_simd_oi one vector at a time.
	(vst1q_bf16_x2): Likewise.
	(vst1_bf16_x3): Use __builtin_memcpy instead of constructing
	an additional __builtin_aarch64_simd_ci one vector at a time.
	(vst1q_bf16_x3): Likewise.
	(vst1_bf16_x4): Use __builtin_memcpy instead of a union.
	(vst1q_bf16_x4): Likewise.
	(vst2_bf16): Use __builtin_memcpy instead of constructing an
	additional __builtin_aarch64_simd_oi one vector at a time.
	(vst2q_bf16): Likewise.
	(vst3_bf16): Use __builtin_memcpy instead of constructing an
	additional __builtin_aarch64_simd_ci mode one vector at a
	time.
	(vst3q_bf16): Likewise.
	(vst4_bf16): Use __builtin_memcpy instead of constructing an
	additional __builtin_aarch64_simd_xi one vector at a time.
	(vst4q_bf16): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-08-06 11:04:27 +01:00
Jonathan Wright 1deb0818f4 aarch64: Use memcpy to copy structures in vst2[q]_lane intrinsics
Use __builtin_memcpy to copy vector structures instead of using a
union - or constructing a new opaque structure one vector at a time -
in each of the vst2[q]_lane Neon intrinsics in arm_neon.h.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst2q_lane intrinsics.

gcc/ChangeLog:

2021-07-30  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (__ST2_LANE_FUNC): Delete.
	(__ST2Q_LANE_FUNC): Delete.
	(vst2_lane_f16): Use __builtin_memcpy to copy vector
	structure instead of constructing __builtin_aarch64_simd_oi
	one vector at a time.
	(vst2_lane_f32): Likewise.
	(vst2_lane_f64): Likewise.
	(vst2_lane_p8): Likewise.
	(vst2_lane_p16): Likewise.
	(vst2_lane_p64): Likewise.
	(vst2_lane_s8): Likewise.
	(vst2_lane_s16): Likewise.
	(vst2_lane_s32): Likewise.
	(vst2_lane_s64): Likewise.
	(vst2_lane_u8): Likewise.
	(vst2_lane_u16): Likewise.
	(vst2_lane_u32): Likewise.
	(vst2_lane_u64): Likewise.
	(vst2_lane_bf16): Likewise.
	(vst2q_lane_f16): Use __builtin_memcpy to copy vector
	structure instead of using a union.
	(vst2q_lane_f32): Likewise.
	(vst2q_lane_f64): Likewise.
	(vst2q_lane_p8): Likewise.
	(vst2q_lane_p16): Likewise.
	(vst2q_lane_p64): Likewise.
	(vst2q_lane_s8): Likewise.
	(vst2q_lane_s16): Likewise.
	(vst2q_lane_s32): Likewise.
	(vst2q_lane_s64): Likewise.
	(vst2q_lane_u8): Likewise.
	(vst2q_lane_u16): Likewise.
	(vst2q_lane_u32): Likewise.
	(vst2q_lane_u64): Likewise.
	(vst2q_lane_bf16): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-08-06 11:04:13 +01:00
Jonathan Wright 344f879c66 aarch64: Use memcpy to copy structures in vst3[q]_lane intrinsics
Use __builtin_memcpy to copy vector structures instead of using a
union - or constructing a new opaque structure one vector at a time -
in each of the vst3[q]_lane Neon intrinsics in arm_neon.h.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst3q_lane intrinsics.

gcc/ChangeLog:

2021-07-30  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (__ST3_LANE_FUNC): Delete.
	(__ST3Q_LANE_FUNC): Delete.
	(vst3_lane_f16): Use __builtin_memcpy to copy vector
	structure instead of constructing __builtin_aarch64_simd_ci
	one vector at a time.
	(vst3_lane_f32): Likewise.
	(vst3_lane_f64): Likewise.
	(vst3_lane_p8): Likewise.
	(vst3_lane_p16): Likewise.
	(vst3_lane_p64): Likewise.
	(vst3_lane_s8): Likewise.
	(vst3_lane_s16): Likewise.
	(vst3_lane_s32): Likewise.
	(vst3_lane_s64): Likewise.
	(vst3_lane_u8): Likewise.
	(vst3_lane_u16): Likewise.
	(vst3_lane_u32): Likewise.
	(vst3_lane_u64): Likewise.
	(vst3_lane_bf16): Likewise.
	(vst3q_lane_f16): Use __builtin_memcpy to copy vector
	structure instead of using a union.
	(vst3q_lane_f32): Likewise.
	(vst3q_lane_f64): Likewise.
	(vst3q_lane_p8): Likewise.
	(vst3q_lane_p16): Likewise.
	(vst3q_lane_p64): Likewise.
	(vst3q_lane_s8): Likewise.
	(vst3q_lane_s16): Likewise.
	(vst3q_lane_s32): Likewise.
	(vst3q_lane_s64): Likewise.
	(vst3q_lane_u8): Likewise.
	(vst3q_lane_u16): Likewise.
	(vst3q_lane_u32): Likewise.
	(vst3q_lane_u64): Likewise.
	(vst3q_lane_bf16): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-08-06 11:03:58 +01:00
Jonathan Wright a607592694 aarch64: Use memcpy to copy structures in vst4[q]_lane intrinsics
Use __builtin_memcpy to copy vector structures instead of using a
union - or constructing a new opaque structure one vector at a time -
in each of the vst4[q]_lane Neon intrinsics in arm_neon.h.

Add new code generation tests to verify that superfluous move
instructions are not generated for the vst4q_lane intrinsics.

gcc/ChangeLog:

2021-07-29  Jonathan Wright  <jonathan.wright@arm.com>

	* config/aarch64/arm_neon.h (__ST4_LANE_FUNC): Delete.
	(__ST4Q_LANE_FUNC): Delete.
	(vst4_lane_f16): Use __builtin_memcpy to copy vector
	structure instead of constructing __builtin_aarch64_simd_xi
	one vector at a time.
	(vst4_lane_f32): Likewise.
	(vst4_lane_f64): Likewise.
	(vst4_lane_p8): Likewise.
	(vst4_lane_p16): Likewise.
	(vst4_lane_p64): Likewise.
	(vst4_lane_s8): Likewise.
	(vst4_lane_s16): Likewise.
	(vst4_lane_s32): Likewise.
	(vst4_lane_s64): Likewise.
	(vst4_lane_u8): Likewise.
	(vst4_lane_u16): Likewise.
	(vst4_lane_u32): Likewise.
	(vst4_lane_u64): Likewise.
	(vst4_lane_bf16): Likewise.
	(vst4q_lane_f16): Use __builtin_memcpy to copy vector
	structure instead of using a union.
	(vst4q_lane_f32): Likewise.
	(vst4q_lane_f64): Likewise.
	(vst4q_lane_p8): Likewise.
	(vst4q_lane_p16): Likewise.
	(vst4q_lane_p64): Likewise.
	(vst4q_lane_s8): Likewise.
	(vst4q_lane_s16): Likewise.
	(vst4q_lane_s32): Likewise.
	(vst4q_lane_s64): Likewise.
	(vst4q_lane_u8): Likewise.
	(vst4q_lane_u16): Likewise.
	(vst4q_lane_u32): Likewise.
	(vst4q_lane_u64): Likewise.
	(vst4q_lane_bf16): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
	tests.
2021-08-06 11:01:52 +01:00
Martin Liska 318113a961 rs6000: Fix restored rs6000_long_double_type_size
As mentioned in the "Fallout: save/restore target options in handle_optimize_attribute"
thread, we need to support target option restore
of rs6000_long_double_type_size == FLOAT_PRECISION_TFmode.

gcc/ChangeLog:

	* config/rs6000/rs6000.c (rs6000_option_override_internal): When
	a target option is restored, it can have
	rs6000_long_double_type_size set to FLOAT_PRECISION_TFmode
	and error should not be emitted.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pragma-optimize.c: New test.
2021-08-06 11:03:20 +02:00
Richard Biener fd351c76c2 Fixup gfortran.dg/vect/vect-8.f90 for aarch64
With the emulated gather changes we now consistently vectorize
for aarch64 and we can remove the SVE special-casing.

2021-08-06  Richard Biener  <rguenther@suse.de>

	* gfortran.dg/vect/vect-8.f90: Simplify aarch64 scanning.
2021-08-06 08:43:50 +02:00
Sebastian Huber 9124bbe185 gcov: Add __gcov_info_to_gdca()
Add __gcov_info_to_gcda() to libgcov to get the gcda data for a gcda info in a
freestanding environment.  It is intended to be used with the
-fprofile-info-section option.  A crude test program which doesn't use a linker
script is (use "gcc -coverage -fprofile-info-section -lgcov test.c" to compile
it):

  #include <gcov.h>
  #include <stdio.h>
  #include <stdlib.h>

  extern const struct gcov_info *my_info;

  static void
  filename (const char *f, void *arg)
  {
    printf("filename: %s\n", f);
  }

  static void
  dump (const void *d, unsigned n, void *arg)
  {
    const unsigned char *c = d;

    for (unsigned i = 0; i < n; ++i)
      printf ("%02x", c[i]);
  }

  static void *
  allocate (unsigned length, void *arg)
  {
    return malloc (length);
  }

  int main()
  {
    __asm__ volatile (".set my_info, .LPBX2");
    __gcov_info_to_gcda (my_info, filename, dump, allocate, NULL);
    return 0;
  }

With this patch, <stdint.h> is included in libgcov-driver.c even if
inhibit_libc is defined.  This header file should be also available for
freestanding environments.  If this is not the case, then we have to define
intptr_t somehow.

The patch removes one use of memset() which makes the <string.h> include
superfluous.

gcc/

	* gcov-io.h (gcov_write): Declare.
	* gcov-io.c (gcov_write): New.
	(gcov_write_counter): Remove.
	(gcov_write_tag_length): Likewise.
	(gcov_write_summary): Replace gcov_write_tag_length() with calls to
	gcov_write_unsigned().
	* doc/invoke.texi (fprofile-info-section): Mention
	__gcov_info_to_gdca().

gcc/testsuite/

	* gcc.dg/gcov-info-to-gcda.c: New test.

libgcc/

	* Makefile.in (LIBGCOV_DRIVER): Add _gcov_info_to_gcda.
	* gcov.h (gcov_info): Declare.
	(__gcov_info_to_gdca): Likewise.
	* libgcov.h (gcov_write_counter): Remove.
	(gcov_write_tag_length): Likewise.
	* libgcov-driver.c (#include <stdint.h>): New.
	(#include <string.h>): Remove.
	(NEED_L_GCOV): Conditionally define.
	(NEED_L_GCOV_INFO_TO_GCDA): Likewise.
	(are_all_counters_zero): New.
	(gcov_dump_handler): Likewise.
	(gcov_allocate_handler): Likewise.
	(dump_unsigned): Likewise.
	(dump_counter): Likewise.
	(write_topn_counters): Add dump_fn, allocate_fn, and arg parameters.
	Use dump_unsigned() and dump_counter().
	(write_one_data): Add dump_fn, allocate_fn, and arg parameters.  Use
	dump_unsigned(), dump_counter(), and are_all_counters_zero().
	(__gcov_info_to_gcda): New.
2021-08-06 07:28:26 +02:00
Martin Sebor a3d3e8c362 Adjust by-value function vec arguments to by-reference.
gcc/c/ChangeLog:

	* c-parser.c (c_parser_declaration_or_fndef): Adjust by-value function
	vec arguments to by-reference.
	(c_finish_omp_declare_simd): Same.
	(c_parser_compound_statement_nostart): Same.
	(c_parser_for_statement): Same.
	(c_parser_objc_methodprotolist): Same.
	(c_parser_oacc_routine): Same.
	(c_parser_omp_for_loop): Same.
	(c_parser_omp_declare_simd): Same.

gcc/ChangeLog:

	* dominance.c (prune_bbs_to_update_dominators): Adjust by-value vec
	arguments to by-reference.
	(iterate_fix_dominators): Same.
	* dominance.h (iterate_fix_dominators): Same.
	* ipa-prop.h: Call auto_vec::to_vec_legacy.
	* tree-data-ref.c (dump_data_dependence_relation): Adjust by-value vec
	arguments to by-reference.
	(debug_data_dependence_relation): Same.
	(dump_data_dependence_relations): Same.
	* tree-data-ref.h (debug_data_dependence_relation): Same.
	(dump_data_dependence_relations): Same.
	* tree-predcom.c (dump_chains): Same.
	(initialize_root_vars_lm): Same.
	(determine_unroll_factor): Same.
	(replace_phis_by_defined_names): Same.
	(insert_init_seqs): Same.
	(pcom_worker::tree_predictive_commoning_loop): Call
	 auto_vec::to_vec_legacy.
	* tree-ssa-pre.c (insert_into_preds_of_block): Adjust by-value vec
	arguments to by-reference.
	* tree-ssa-threadbackward.c (populate_worklist): Same.
	(back_threader::resolve_def): Same.
	* tree-vect-data-refs.c (vect_check_nonzero_value): Same.
	(vect_enhance_data_refs_alignment): Same.
	(vect_check_lower_bound): Same.
	(vect_prune_runtime_alias_test_list): Same.
	(vect_permute_store_chain): Same.
	* tree-vect-slp-patterns.c (vect_normalize_conj_loc): Same.
	* tree-vect-stmts.c (vect_create_vectorized_demotion_stmts): Same.
	* tree-vectorizer.h (vect_permute_store_chain): Same.
	* vec.c (test_init): New function.
	(vec_c_tests): Call new function.
	* vec.h (vec): Declare ctors, dtor, and assignment.
	(auto_vec::vec_to_legacy): New function.
	(vec::copy): Adjust initialization.
2021-08-05 20:03:38 -06:00
GCC Administrator 8ebf4fb54a Daily bump. 2021-08-06 00:16:29 +00:00
Ian Lance Taylor 582c24e9fe runtime: extend internal atomics to comply with sync/atomic
This is the gofrontend version of https://golang.org/cl/289152.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/339690
2021-08-05 11:41:57 -07:00
Jonathan Wakely c8b024fa4b libstdc++: Move [[nodiscard]] attributes again [PR101782]
Where I moved these nodiscard attributes to made them apply to the
function type, not to the function. This meant they no longer generated
the desired -Wunused-result warnings, and were ill-formed with Clang
(but only a pedwarn with GCC).

Clang also detected ill-formed attributes in <queue> which this fixes.

Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++-v3/ChangeLog:

	PR libstdc++/101782
	* include/bits/ranges_base.h (ranges::begin, ranges::end)
	(ranges::rbegin, ranges::rend, ranges::size, ranges::ssize)
	(ranges::empty, ranges::data): Move attribute after the
	declarator-id instead of at the end of the declarator.
	* include/bits/stl_iterator.h (__gnu_cxx::__normal_iterator):
	Move attributes back to the start of the function declarator,
	but move the requires-clause to the end.
	(common_iterator): Move attribute after the declarator-id.
	* include/bits/stl_queue.h (queue): Remove ill-formed attributes
	from friend declaration that are not definitions.
	* include/std/ranges (views::all, views::filter)
	(views::transform, views::take, views::take_while,
	views::drop) (views::drop_while, views::join,
	views::lazy_split) (views::split, views::counted,
	views::common, views::reverse) (views::elements): Move
	attributes after the declarator-id.
2021-08-05 19:01:51 +01:00
Jakub Jelinek 4739344d36 libcpp: Regenerate ucnid.h using Unicode 13.0.0 files [PR100977]
The following patch (incremental to the makeucnid.c fix) regenerates
ucnid.h with https://www.unicode.org/Public/13.0.0/ucd/ files.

2021-08-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/100977
	* ucnid.h: Regenerated using Unicode 13.0.0 files.
2021-08-05 17:35:20 +02:00
Jakub Jelinek 4805b92a32 libcpp: Fix makeucnid bug with combining values [PR100977]
I've noticed in ucnid.h two adjacent lines that had all flags and combine
values identical and as such were supposed to be merged.

This is due to a bug in makeucnid.c, which records last_flag,
last_combine and really_safe of what has just been printed, but
because of a typo mishandles it for last_combine, always compares against
the combining_value[0] which is 0.

This has two effects on the table, one is that often the table is
unnecessarily large, as for non-zero .combine every character has its own
record instead of adjacent characters with the same flags and combine
being merged.  This means larger tables.
The other is that sometimes the last char that has combine set doesn't
actually have it in the tables, because the code is printing entries only
upon seeing the next character and if that character does have
combining_value of 0 and flags are otherwise the same as previously printed,
it will not print anything.

The following patch fixes that, for clarity what exactly it affects
I've regenerated with the same Unicode files as last time it has
been regenerated.

2021-08-05  Jakub Jelinek  <jakub@redhat.com>

	PR c++/100977
	* makeucnid.c (write_table): Fix computation of last_combine.
	* ucnid.h: Regenerated using Unicode 6.3.0 files.
2021-08-05 17:34:16 +02:00
Jakub Jelinek 02e5ffd5db libgcc: Honor LDFLAGS_FOR_TARGET when linking libgcc_s
When building gcc with some specific LDFLAGS_FOR_TARGET, e.g.
LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now
those flags propagate info linking of target shared libraries,
e.g. lib{ubsan,tsan,stdc++,quadmath,objc,lsan,itm,gphobos,gdruntime,gomp,go,gfortran,atomic,asan}.so.*
but there is one important exception, libgcc_s.so.* linking ignores it.

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux with LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now
and verified that libgcc_s.so.* is BIND_NOW when it previously wasn't, and
without any LDFLAGS_FOR_TARGET on x86_64-linux and i686-linux.
There on x86_64-linux I've verified that the libgcc_s.so.1 linking command
line for -m64 is identical except for whitespace to one without the patch,
and for -m32 multilib $(LDFLAGS) actually do supply there an extra -m32
that also repeats later in the @multilib_flags@, which should be harmless.

2021-08-04  Jakub Jelinek  <jakub@redhat.com>

	* config/t-slibgcc (SHLIB_LINK): Add $(LDFLAGS).
	* config/t-slibgcc-darwin (SHLIB_LINK): Likewise.
	* config/t-slibgcc-vms (SHLIB_LINK): Likewise.
	* config/t-slibgcc-fuchsia (SHLIB_LDFLAGS): Remove $(LDFLAGS).
2021-08-05 17:32:06 +02:00
Chung-Lin Tang 0bac793ed6 openmp: Implement omp_get_device_num routine
This patch implements the omp_get_device_num library routine, specified in
OpenMP 5.0.

GOMP_DEVICE_NUM_VAR is a macro symbol which defines name of a "device number"
variable, is defined on the device-side libgomp, has it's address returned to
host-side libgomp during device initialization, and the host libgomp then
sets its value to the designated device number.

libgomp/ChangeLog:

	* icv-device.c (omp_get_device_num): New API function, host side.
	* fortran.c (omp_get_device_num_): New interface function.
	* libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Define macro symbol.
	* libgomp.map (OMP_5.0.2): New version space with omp_get_device_num,
	omp_get_device_num_.
	* libgomp.texi (omp_get_device_num): Add documentation for new API
	function.
	* omp.h.in (omp_get_device_num): Add declaration.
	* omp_lib.f90.in (omp_get_device_num): Likewise.
	* omp_lib.h.in (omp_get_device_num): Likewise.
	* target.c (gomp_load_image_to_device): If additional entry for device
	number exists at end of returned entries from 'load_image_func' hook,
	copy the assigned device number over to the device variable.

	* config/gcn/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-gcn.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Define static global.
	(omp_get_device_num): New API function, device side.
	* plugin/plugin-nvptx.c ("symcat.h"): Add include.
	(GOMP_OFFLOAD_load_image): Add addresses of device GOMP_DEVICE_NUM_VAR
	at end of returned 'target_table' entries.

	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_intelmic): New function for
	testing for intelmic offloading.
	* testsuite/libgomp.c-c++-common/target-45.c: New test.
	* testsuite/libgomp.fortran/target10.f90: New test.
2021-08-05 23:29:03 +08:00