Commit Graph

191594 Commits

Author SHA1 Message Date
Tobias Burnus c22f3fb780 OpenMP/C++: Permit mapping classes with virtual members [PR102204]
PR c++/102204
gcc/cp/ChangeLog:

	* decl2.cc (cp_omp_mappable_type_1): Remove check for virtual
	members as those are permitted since OpenMP 5.0.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/target-virtual-1.C: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/unmappable-1.C: Remove previously expected dg-message.
2022-02-10 19:03:42 +01:00
David Malcolm 2ac7b19f1e analyzer: handle more casts of string literals [PR98797]
gcc/analyzer/ChangeLog:
	PR analyzer/98797
	* region-model-manager.cc
	(region_model_manager::maybe_fold_sub_svalue): Generalize getting
	individual chars of a STRING_CST from element_region to any
	subregion which is a concrete access of a single byte from its
	parent region.
	* region.cc (region::get_relative_concrete_byte_range): New.
	* region.h (region::get_relative_concrete_byte_range): New decl.

gcc/testsuite/ChangeLog:
	PR analyzer/98797
	* gcc.dg/analyzer/casts-1.c: Mark xfails as fixed; add further
	test coverage for casts of string literals.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-02-10 13:00:58 -05:00
Qing Zhao b32305b41d middle-end: updating the reg use in exit block for -fzero-call-used-regs [PR100775]
In the pass_zero_call_used_regs, when updating dataflow info after adding
the register zeroing sequence in the epilogue of the function, we should
call "df_update_exit_block_uses" to update the register use information in
the exit block to include all the registers that have been zeroed.

2022-02-10  Qing Zhao  <qing.zhao@oracle.com>

gcc/ChangeLog:

	PR middle-end/100775
	* function.cc (gen_call_used_regs_seq): Call
	df_update_exit_block_uses when updating df.

gcc/testsuite/ChangeLog:

	PR middle-end/100775
	* gcc.target/arm/pr100775.c: New test.
2022-02-10 16:40:39 +00:00
Uros Bizjak 53fcc46339 i386: Fix vec_unpacks_float_lo_v4si operand constraint [PR104469]
2022-02-10  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

	PR target/104469
	* config/i386/sse.md (vec_unpacks_float_lo_v4si):
	Change operand 1 constraint to register_operand.

gcc/testsuite/ChangeLog:

	PR target/104469
	* gcc.target/i386/pr104469.c: New test.
2022-02-10 17:23:59 +01:00
H.J. Lu 69febe8527 pr104458.c: Replace long with long long for -mx32
PR target/104458
	* gcc.target/i386/pr104458.c: Replace long with long long.
2022-02-10 06:28:17 -08:00
David Malcolm 8383d41d70 analyzer: fix testsuite issues seen with mingw [PR102052]
gcc/testsuite/ChangeLog:
	PR analyzer/102052
	* gcc.dg/analyzer/fields.c (size_t): Use __SIZE_TYPE__ rather than
	hardcoding long unsigned int.
	* gcc.dg/analyzer/gzio-3.c (size_t): Likewise.
	* gcc.dg/analyzer/gzio-3a.c (size_t): Likewise.
	* gcc.dg/analyzer/pr98969.c (test_1): Use __UINTPTR_TYPE__ rather
	than long int.
	(test_2): Likewise.
	* gcc.dg/analyzer/pr99716-2.c (test_mountpoint): Use "rand" rather
	than "random".
	* gcc.dg/analyzer/pr99774-1.c (size_t): Use __SIZE_TYPE__ rather
	than hardcoding long unsigned int.
	* gcc.dg/analyzer/strndup-1.c: Add MinGW to targets that don't
	implement strndup.
	* gcc.dg/analyzer/zlib-5.c (size_t): Use __SIZE_TYPE__ rather
	than hardcoding long unsigned int.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-02-10 09:13:46 -05:00
Patrick Palka 3d7341cd73 c++: memfn lookup consistency and dependent using-decls
Rather than not doing any filtering when filter_memfn_lookup encounters
a dependent using-decl, handle this case less imprecisely by holding on
to the members in the new lookup set that come from a base, i.e. that
could plausibly have been introduced by that using-decl, and filtering
the rest as usual.  This is still imperfect, but it's closer to the
correct answer than the previous behavior was.

gcc/cp/ChangeLog:

	* pt.cc (filter_memfn_lookup): Handle dependent USING_DECL
	better.
2022-02-10 08:54:07 -05:00
Roger Sayle 3881e1823c gfortran: Respect target's NO_DOT_IN_LABEL in trans-common.cc
This patch fixes 9 unexpected failures in the gfortran testsuite on
nvptx-none.  The issue is that gfortran's EQUIVALENCE internally uses
symbols such as "equiv.0" even on platforms that define NO_DOT_IN_LABEL.
On nvptx-none, this then results in the following error message(s):
ptxas application ptx input, fatal: Parsing error near '.0': syntax error
ptxas fatal: Ptx assembly aborted due to errors

The fix is to tweak trans-common.cc to respect the target's NO_DOT_IN_LABEL
(and NO_DOLLAR_IN_LABEL) when generating internal equiv.%d symbols.
Only the nvptx, mmix and xtensa backends define NO_DOT_IN_LABEL which
explains why no-one has spotted/fixed this issue since the problematic
code was last changed back in 2005(!).

2022-02-10  Roger Sayle  <roger@nextmovesoftware.com>
	    Tobias Burnus  <tobias@codesourcery.com>

gcc/fortran/ChangeLog
	* trans-common.cc (GFC_EQUIV_FMT): New macro respecting the
	target's NO_DOT_IN_LABEL and NO_DOLLAR_IN_LABEL preferences.
	(build_equiv_decl): Use GFC_EQUIV_FMT here.
2022-02-10 13:32:07 +00:00
Jonathan Wakely 3e539985cc libstdc++: Add atomic_fetch_xor to <stdatomic.h>
This function (and the explicit memory over version) are present in both
C++ <atomic> and C <stdatomic.h>, so should be in C++ <stdatomic.h> too.
There is a library issue incoming for this, but the resolution is
obvious.

libstdc++-v3/ChangeLog:

	* include/c_compatibility/stdatomic.h (atomic_fetch_xor): Add
	using-declaration.
	(atomic_fetch_xor_explicit): Likewise.
	* testsuite/29_atomics/headers/stdatomic.h/c_compat.cc: Check
	arithmetic and logical operations for atomic_int.
2022-02-10 13:01:10 +00:00
Jonathan Wakely 3d5f4f76e6 libstdc++: Fix directory iterator build for newlib
When building for newlib HAVE_OPENAT and HAVE_UNLINKAT are (sometimes?)
defined, but <fcntl.h> is only included when HAVE_DIRENT_H is defined.
Since directory iterators are completely useless without <dirent.h>,
just override the HAVE_OPENAT and HAVE_UNLINKAT detection when we don't
have <dirent.h>.

libstdc++-v3/ChangeLog:

	* src/filesystem/dir-common.h (_GLIBCXX_HAVE_DIRFD): Undefine
	when <dirent.h> is not available.
	(_GLIBCXX_HAVE_UNLINKAT):  Likewise.
2022-02-10 13:01:10 +00:00
Richard Biener 0f58ba4dd6 tree-optimization/104373 - early diagnostic on unreachable code
The following improves early uninit diagnostics by computing edge
reachability using VN and ignoring unreachable blocks when looking
for uninitialized uses.  To not ICE with -fdump-tree-all the
early uninit pass needs a dumpfile since VN tries to dump statistics.

2022-02-04  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/104373
	* tree-ssa-sccvn.h (do_rpo_vn): New export exposing the
	walk kind.
	* tree-ssa-sccvn.cc (do_rpo_vn): Export, get the default
	walk kind as argument.
	(run_rpo_vn): Adjust.
	(pass_fre::execute): Likewise.
	* tree-ssa-uninit.cc (warn_uninitialized_vars): Skip
	blocks not reachable.
	(execute_late_warn_uninitialized): Mark all edges as
	executable.
	(execute_early_warn_uninitialized): Use VN to compute
	executable edges.
	(pass_data_early_warn_uninitialized): Enable a dump file,
	change dump name to warn_uninit.

	* g++.dg/warn/Wuninitialized-32.C: New testcase.
	* gcc.dg/uninit-pr20644-O0.c: Remove XFAIL.
2022-02-10 10:56:14 +01:00
Richard Biener 4a8083285c middle-end/104467 - fix vector extract simplification
This fixes a bogus vector type used for a CTOR build as part of
vector extract simplification.  The code failed to consider a
CTOR of vector elements.

2022-02-10  Richard Biener  <rguenther@suse.de>

	PR middle-end/104467
	* match.pd (vector extract simplification): Multiply the
	number of CTOR elements with the number of element elements.

	* gcc.dg/torture/pr104467.c: New testcase.
2022-02-10 10:54:43 +01:00
Richard Biener 1b72d456b2 tree-optimization/104466 - fix cut&paste error perventing alias disambiguation
The following fixes a cut&paste error in disambiguating using restrict
info.  Instead of using the for this purpose computed rbase1/rbase2
which preserve MEM_REF bases even when they are based on a decl the
code performs the check on the bases that drop info for those ...

2022-02-10  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/104466
	* tree-ssa-alias.cc (refs_may_alias_p_2): Use rbase1/rbase2
	for the MR_DEPENDENCE checks as intended.

	* gfortran.dg/pr104466.f90: New testcase.
2022-02-10 10:54:43 +01:00
Tom de Vries 19a13d5a1d [nvptx] Handle sm_7x shared atomic store more optimal
For sm_7x atomic stores we fall back on expand_atomic_store, but this
results in using membar.sys for shared stores.

Fix this by adding an nvptx_atomic_store insn that adds a membar.cta for a
shared store.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-02-02  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.md (define_insn "nvptx_atomic_store<mode>"): New
	define_insn.
	(define_expand "atomic_store<mode>"): Use nvptx_atomic_store<mode> for
	TARGET_SM70.
	(define_c_enum "unspecv"): Add UNSPECV_ST.

gcc/testsuite/ChangeLog:

2022-02-02  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/atomic-store-2.c: New test.
2022-02-10 10:11:56 +01:00
Tom de Vries 3e7d4e82dc [nvptx] Handle pre-sm_7x shared atomic store using atomic exchange
The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory
locations do not guarantee atomicity with respect to normal store instructions
to the same address.

This can be fixed by:
- inserting barriers between normal stores and atomic operations to a common
  address
- using atom.exch to store to locations accessed by other atomic operations.

It's not clearly spelled out which barriers are needed, and a barrier seem more
expensive than atomic exchange.

Implement the pre-sm_7x shared atomic store using atomic exchange.

That includes stores using generic addressing, since those may also point to
shared memory.

Tested on x86-64 with nvptx accelerator.

gcc/ChangeLog:

2022-02-02  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx-protos.h (nvptx_mem_maybe_shared_p): Declare.
	* config/nvptx/nvptx.cc (nvptx_mem_data_area): New static function.
	(nvptx_mem_maybe_shared_p): New function.
	* config/nvptx/nvptx.md (define_expand "atomic_store<mode>"): New
	define_expand.

gcc/testsuite/ChangeLog:

2022-02-02  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/atomic-store-1.c: New test.
	* gcc.target/nvptx/atomic-store-3.c: New test.
	* gcc.target/nvptx/stack-atomics-run.c: Update.
2022-02-10 10:10:44 +01:00
Tom de Vries 5b2d679bbb [nvptx] Workaround sub.u16 driver JIT bug
There's a nvidia driver JIT bug that mishandles this code (minimized from
builtin-arith-overflow-15.c):
...
int main (void) {
  signed char r;
  unsigned char y = (unsigned char) 0x80;
  if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, &r))
    __builtin_abort ();
  return 0;
}
...
which at ptx level minimizes to:
...
  mov.u16 r22, 0x0080;
  st.local.u16 [frame_var],r22;
  ld.local.u16 r32,[frame_var];
  sub.u16 r33,0x0000,r32;
  cvt.u32.u16 r35,r33;
...
where we expect r35 == 0x0000ff80 but get instead 0xffffff80, and where using
nvptx-none-run -O0 fixes the problem.  [ See also
https://github.com/vries/nvidia-bugs/tree/master/builtin-arith-overflow-15 . ]

Try to workaround the bug by using sub.s16 instead of sub.u16.

Tested on nvptx.

gcc/ChangeLog:

2022-02-07  Tom de Vries  <tdevries@suse.de>

	PR target/97005
	* config/nvptx/nvptx.md (define_insn "sub<mode>3"): Workaround
	driver JIT bug by using sub.s16 instead of sub.u16.
2022-02-10 09:50:49 +01:00
Tobias Burnus 9694f61219 Fortran/OpenMP: Avoid ICE for invalid char array in omp atomic [PR104329]
PR fortran/104329
gcc/fortran/ChangeLog:

	* openmp.cc (resolve_omp_atomic): Defer extra-code assert after
	other diagnostics.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/atomic-28.f90: New test.
2022-02-10 09:30:19 +01:00
Roger Sayle 6d98e83b2c nvptx: Tweak constraints on copysign instructions
Many thanks to Thomas Schwinge for confirming my hypothesis that the register
usage regression, PR target/104345, is solely due to libgcc's _muldc3 function.
In addition to the isinf functionality in the previously proposed nvptx patch at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588453.html which
significantly reduces the number of instructions in _muldc3, the patch below
further reduces both the number of instructions and the number of explicitly
declared registers, by permitting floating point constant immediate operands
in nvptx's copysign instruction.

Fingers-crossed, the combination with all of the previous proposed nvptx
patches improves things.  Ultimately, increasing register usage from 50 to
51 registers, reducing the number of concurrent threads by ~2%, can easily
be countered if we're now executing significantly fewer instructions in each
kernel, for a net performance win.

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.

gcc/ChangeLog:

	* config/nvptx/nvptx.md (copysign<mode>3): Allow immediate
	floating point constants as operands 1 and/or 2.
2022-02-10 09:01:55 +01:00
Roger Sayle 9bacd7af2e PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0
This patch addresses the "increased register pressure" regression on
nvptx-none caused by my change to transition the backend to a
STORE_FLAG_VALUE = 1 target.  This improved code generation for the
more common case of producing 0/1 Boolean values, but unfortunately
made things marginally worse when a 0/-1 mask value is desired.
Unfortunately, nvptx kernels are extremely sensitive to changes in
register usage, which was observable in the reported PR.

This patch provides optimizations for -(cond ? 1 : 0), effectively
simplify this into cond ? -1 : 0, where these ternary operators are
provided by nvptx's selp instruction, and for the specific case of
SImode, using (restoring) nvptx's "set" instruction (which avoids
the need for a predicate register).

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.  Unfortunately,
the exact register usage of a nvptx kernel depends upon the version of
the Cuda drivers being used (and the hardware), but I believe this
change should resolve the PR (for Thomas) by improving code generation
for the cases that regressed.

gcc/ChangeLog:

	PR target/104345
	* config/nvptx/nvptx.md (sel_true<mode>): Fix indentation.
	(sel_false<mode>): Likewise.
	(define_code_iterator eqne): New code iterator for EQ and NE.
	(*selp<mode>_neg_<code>): New define_insn_and_split to optimize
	the negation of a selp instruction.
	(*selp<mode>_not_<code>): New define_insn_and_split to optimize
	the bitwise not of a selp instruction.
	(*setcc_int<mode>): Use set instruction for neg:SI of a selp.

gcc/testsuite/ChangeLog:

	PR target/104345
	* gcc.target/nvptx/neg-selp.c: New test case.
2022-02-10 09:01:54 +01:00
Roger Sayle f68c3de7fc nvptx: Fix and use BI mode logic instructions (e.g. and.pred)
This patch adds support for nvptx's BImode and.pred, or.pred and
xor.pred instructions.  Technically, nvptx.md previously defined
andbi3, iorbi3 and xorbi3 instructions, but the assembly language
mnemonic output for these was incorrect (e.g. and.b1) and would be
rejected by the ptxas assembler.  The most significant part of this
patch is the new define_split which teaches the compiler to actually
use these instructions when appropriate (exposing the latent bug above).

After https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587999.html,
the function:

int foo(int x, int y) { return (x==21) && (y==69); }

when compiled with -O2 produces:

                mov.u32 %r26, %ar0;
                mov.u32 %r27, %ar1;
                setp.eq.u32     %r31, %r26, 21;
                setp.eq.u32     %r34, %r27, 69;
                selp.u32        %r37, 1, 0, %r31;
                selp.u32        %r38, 1, 0, %r34;
                and.b32 %value, %r37, %r38;

with this patch we now save an extra instruction and generate:

                mov.u32 %r26, %ar0;
                mov.u32 %r27, %ar1;
                setp.eq.u32     %r31, %r26, 21;
                setp.eq.u32     %r34, %r27, 69;
                and.pred        %r39, %r34, %r31;
                selp.u32        %value, 1, 0, %r39;

This patch has been tested (on top of the patch mentioned above) on
nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a
make and make -k check with no new failures.

gcc/ChangeLog:

	* config/nvptx/nvptx.md (any_logic): Move code iterator earlier
	in machine description.
	(logic): Move code attribute earlier in machine description.
	(ilogic): New code attribute, like logic but "ior" for IOR.
	(and<mode>3, ior<mode>3, xor<mode>3): Delete. Replace with...
	(<ilogic><mode>3): New define_insn for HSDIM logic operations.
	(<ilogic>bi3): New define_insn for BI mode logic operations.
	(define_split): Lower logic operations from integer modes to
	BI mode predicate operations.

gcc/testsuite/ChangeLog:

	* gcc.target/nvptx/bool-1.c: Update.
	* gcc.target/nvptx/bool-2.c: New test case for and.pred.
	* gcc.target/nvptx/bool-3.c: New test case for or.pred.
	* gcc.target/nvptx/bool-4.c: New test case for xor.pred.
2022-02-10 09:01:54 +01:00
Roger Sayle 26d7b8f9bd nvptx: Add support for 64-bit mul.hi (and other) instructions
Now that the middle-end MULT_HIGHPART_EXPR pieces are in place, this
patch adds support for nvptx's mul.hi.s64 and mul.hi.u64 instructions,
as previously reviewed (provisionally pre-approved) back in August 2020:
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551373.html
Since then a few things have changed, so this patch uses the new
SMUL_HIGHPART and UMUL_HIGHPART RTX expressions, but the test cases
remain the same.  Like the x86_64 backend, this patch retains the
"trunc" forms of these instructions (while the RTL optimizers/combine
may still generate them).

Given that we're rapidly approaching stage 4, I also took the liberty
of including support in nvptx.md for a few other instructions.  With
the new 64-bit highpart multiplication instructions added above, we
can now provide a define_expand for efficient 64-bit (to 128-bit)
widening multiplications.  This patch also adds support for nvptx's
testp.infinite instruction (for implementing __builtin_isinf) and
the not.pred instruction.

As an example of the code generation improvements, the function
int foo(double x) { return __builtin_isinf(x); }
previously generated with -O2:

                mov.f64 %r26, %ar0;
                abs.f64 %r28, %r26;
                setp.leu.f64    %r31, %r28, 0d7fefffffffffffff;
                selp.u32        %r30, 1, 0, %r31;
                mov.u32 %r29, %r30;
                cvt.u16.u8      %r35, %r29;
                mov.u16 %r33, %r35;
                xor.b16 %r32, %r33, 1;
                cvt.u32.u16     %r34, %r32;
                cvt.u32.u8      %value, %r34;

and with this patch now generates:

                mov.f64 %r23, %ar0;
                testp.infinite.f64      %r24, %r23;
                selp.u32        %value, 1, 0, %r24;

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.

gcc/ChangeLog:

	* config/nvptx/nvptx.md (UNSPEC_ISINF): New UNSPEC.
	(one_cmplbi2): New define_insn for not.pred.
	(mulditi3): New define_expand for signed widening multiply.
	(umulditi3): New define_expand for unsigned widening multiply.
	(smul<mode>3_highpart): New define_insn for signed highpart mult.
	(umul<mode>3_highpart): New define_insn for unsigned highpart mult.
	(*smulhi3_highpart_2): Renamed from smulhi3_highpart.
	(*smulsi3_highpart_2): Renamed from smulsi3_highpart.
	(*umulhi3_highpart_2): Renamed from umulhi3_highpart.
	(*umulsi3_highpart_2): Renamed from umulsi3_highpart.
	(*setcc<mode>_from_not_bi): New define_insn.
	(*setcc_isinf<mode>): New define_insn for testp.infinite.
	(isinf<mode>2): New define_expand.

gcc/testsuite/ChangeLog:

	* gcc.target/nvptx/mul-hi64.c: New test case.
	* gcc.target/nvptx/umul-hi64.c: New test case.
	* gcc.target/nvptx/mul-wide64.c: New test case.
	* gcc.target/nvptx/umul-wide64.c: New test case.
	* gcc.target/nvptx/isinf.c: New test case.
2022-02-10 09:01:54 +01:00
Roger Sayle de12b919c7 nvptx: Expand QI mode operations using SI mode instructions
One of the unusual target features of the Nvidia PTX ISA is that it
doesn't provide QI mode (byte sized) operations or registers.  Somewhat
conventionally, 8-bit quantities are read from/written to memory using
special instructions, but stored internally using SImode (32-bit) registers.
GCC's middle-end accomodates targets without QImode optabs, by widening
operations until suitable support is found, and with the current nvptx
backend this means 16-bit HImode operations.  The inconvenience is that
nvptx is also a TARGET_TRULY_NOOP_TRUNCATION=false target, meaning that
additional instructions are required to convert between the SImode
registers used to hold QImode values, and the HImode registers used to
operate on them (and back again).  This results in a large amount of
shuffling and type conversion in code dealing with bytes, i.e. using
char or Boolean types.

This patch improves the situation by providing expanders in the nvptx
machine description to perform QImode operations natively in SImode
instead of HImode.  An alternate implementation might be to provide
some form of target hook to specify which fallback modes to use during
RTL expansion, but I think this requirement is unusual, and a solution
entirely in the nvptx backend doesn't disturb/affect other targets.

The improvements can be quite dramatic, as shown in the example below:

int foo(int x, int y) { return (x==21) && (y==69); }

previously with -O2 required 15 instructions:

                mov.u32 %r26, %ar0;
                mov.u32 %r27, %ar1;
                setp.eq.u32     %r31, %r26, 21;
                selp.u32        %r30, 1, 0, %r31;
                mov.u32 %r29, %r30;
                setp.eq.u32     %r34, %r27, 69;
                selp.u32        %r33, 1, 0, %r34;
                mov.u32 %r32, %r33;
                cvt.u16.u8      %r39, %r29;
                mov.u16 %r36, %r39;
                cvt.u16.u8      %r39, %r32;
                mov.u16 %r37, %r39;
                and.b16 %r35, %r36, %r37;
                cvt.u32.u16     %r38, %r35;
                cvt.u32.u8      %value, %r38;

with this patch, now requires only 7 instructions:

                mov.u32 %r26, %ar0;
                mov.u32 %r27, %ar1;
                setp.eq.u32     %r31, %r26, 21;
                setp.eq.u32     %r34, %r27, 69;
                selp.u32        %r37, 1, 0, %r31;
                selp.u32        %r38, 1, 0, %r34;
                and.b32 %value, %r37, %r38;

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.

gcc/ChangeLog:

	* config/nvptx/nvptx.md (cmp<mode>): Renamed from *cmp<mode>.
	(setcc<mode>_from_bi): Additionally support QImode.
	(extendbi<mode>2): Additionally support QImode.
	(zero_extendbi<mode>2): Additionally support QImode.
	(any_sbinary, any_ubinary, any_sunary, any_uunary): New code
	iterators for signed and unsigned, binary and unary operations.
	(<sbinary>qi3, <ubinary>qi3, <sunary>qi2, <uunary>qi2): New
	expanders to perform QImode operations using SImode instructions.
	(cstoreqi4): New define_expand.
	(*ext_truncsi2_qi): New define_insn.
	(*zext_truncsi2_qi): New define_insn.

gcc/testsuite/ChangeLog:

	* gcc.target/nvptx/bool-1.c: New test case.
2022-02-10 09:01:54 +01:00
Roger Sayle 91a7e1daa7 nvptx: Improved support for HFMode including neghf2 and abshf2
This patch adds more support for _Float16 (HFmode) to the nvptx backend.
Currently negation, absolute value and floating point comparisons are
implemented by promoting to float (SFmode).  This patch adds suitable
define_insns to nvptx.md, most conditional on TARGET_SM53 (-misa=sm_53).
This patch also adds support for HFmode fused multiply-add.

One subtlety is that neghf2 and abshf2 are implemented by (HImode)
bit manipulation operations to update the sign bit.  The NVidia PTX
ISA documentation for neg.f16 and abs.f16 contains the caution
"Future implementations may comply with the IEEE 754 standard by preserving
the (NaN) payload and modifying only the sign bit".  Given the availability
of suitable replacements, I thought it best to provide IEEE 754 compliant
implementations.  If anyone observes a performance penalty from this
choice I'm happy to provide a -ffast-math variant (or revisit this
decision).

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.

gcc/ChangeLog:

	* config/nvptx/nvptx.md (*cmpf): New define_insn.
	(cstorehf4): New define_expand.
	(fmahf4): New define_insn.
	(neghf2): New define_insn.
	(abshf2): New define_insn.

gcc/testsuite/ChangeLog:

	* gcc.target/nvptx/float16-3.c: New test case for neghf2.
	* gcc.target/nvptx/float16-4.c: New test case for abshf2.
	* gcc.target/nvptx/float16-5.c: New test case for fmahf4.
	* gcc.target/nvptx/float16-6.c: New test case.
2022-02-10 09:01:54 +01:00
Gerald Pfeifer bcbe280931 doc: Tweak the www.bitwizard.nl reference
gcc:
	* doc/install.texi (Specific): Change the www.bitwizard.nl
	reference to use https.
2022-02-10 08:59:53 +01:00
Marcel Vollweiler bbb7f8604e C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct.
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff):

	has_device_addr(list)

"The has_device_addr clause indicates that its list items already have device
addresses and therefore they may be directly accessed from a target device.
If the device address of a list item is not for the device on which the target
region executes, accessing the list item inside the region results in
unspecified behavior. The list items may include array sections." (p. 200)

"A list item may not be specified in both an is_device_ptr clause and a
has_device_addr clause on the directive." (p. 202)

"A list item that appears in an is_device_ptr or a has_device_addr clause must
not be specified in any data-sharing attribute clause on the same target
construct." (p. 203)

gcc/c-family/ChangeLog:

	* c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
	* c-pragma.h (enum pragma_kind): Added 5.1 in comment.
	(enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr'
	clause.
	(c_parser_omp_variable_list): Handle array sections.
	(c_parser_omp_clause_has_device_addr): Added.
	(c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
	case.
	(c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to
	OMP_CLAUSE_MASK.
	* c-typeck.cc (handle_omp_array_sections): Handle clause restrictions.
	(c_finish_omp_clauses): Handle array sections.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
	(cp_parser_omp_var_list_no_open): Handle array sections.
	(cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
	case.
	(cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
	* semantics.cc (handle_omp_array_sections): Handle clause restrictions.
	(finish_omp_clauses): Handle array sections.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR
	case.
	* gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR.
	* openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
	(gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause.
	(resolve_omp_clauses): Same.
	* trans-openmp.cc (gfc_trans_omp_variable_list): Added
	OMP_LIST_HAS_DEVICE_ADDR case.
	(gfc_trans_omp_clauses): Firstprivatize of array descriptors.

gcc/ChangeLog:

	* gimplify.cc (gimplify_scan_omp_clauses): Added cases for
	OMP_CLAUSE_HAS_DEVICE_ADDR
	and handle array sections.
	(gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
	* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR.
	(lower_omp_target): Same.
	* tree-core.h (enum omp_clause_code): Same.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Same.
	(convert_local_omp_clauses): Same.
	* tree-pretty-print.cc (dump_omp_clause): Same.
	* tree.cc: Same.

libgomp/ChangeLog:

	* libgomp.texi: Updated entry for HAS_DEVICE_ADDR.
	* target.c (copy_firstprivate_data): Copy only if host address is not
	NULL.
	* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-4.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-5.C: New test.
	* testsuite/libgomp.c++/target-has-device-addr-6.C: New test.
	* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
	* testsuite/libgomp.c/target-has-device-addr-3.c: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test.
	* testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases.
	* g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases.
	* g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases.
	* c-c++-common/gomp/target-has-device-addr-1.c: New test.
	* c-c++-common/gomp/target-has-device-addr-2.c: New test.
	* c-c++-common/gomp/target-is-device-ptr-1.c: New test.
	* c-c++-common/gomp/target-is-device-ptr-2.c: New test.
	* gfortran.dg/gomp/is_device_ptr-3.f90: New test.
	* gfortran.dg/gomp/target-has-device-addr-1.f90: New test.
	* gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
2022-02-09 23:47:12 -08:00
Eugene Rozenfeld ba125745d9 AutoFDO: Don't try to promote indirect calls that result in recursive direct calls
AutoFDO tries to promote and inline all indirect calls that were promoted
and inlined in the original binary and that are still hot. In the included
test case, the promotion results in a direct call that is a recursive call.
inline_call and optimize_inline_calls can't handle recursive calls at this stage.
Currently, inline_call fails with a segmentation fault.

This change leaves the indirect call alone if promotion will result in a recursive call.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
	* auto-profile.cc (afdo_indirect_call): Don't attempt to promote indirect calls
	that will result in direct recursive calls.

gcc/testsuite/ChangeLog:
	* g++.dg/tree-prof/indir-call-recursive-inlining.C : New test.
2022-02-09 23:33:10 -08:00
Andrew Pinski 41582f88ec [COMMITTED] Fix PR aarch64/104474: ICE with vector float initializers and non-consts.
The problem here is that the aarch64 back-end was placing const0_rtx
into the constant vector RTL even if the mode was a floating point mode.
The fix is instead to use CONST0_RTX and pass the mode to select the
correct zero (either const_int or const_double).

Committed as obvious after a bootstrap/test on aarch64-linux-gnu with
no regressions.

	PR target/104474

gcc/ChangeLog:

	* config/aarch64/aarch64.cc
	(aarch64_sve_expand_vector_init_handle_trailing_constants):
	Use CONST0_RTX instead of const0_rtx for the non-constant elements.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/pr104474-1.c: New test.
	* gcc.target/aarch64/sve/pr104474-2.c: New test.
	* gcc.target/aarch64/sve/pr104474-3.c: New test.
2022-02-09 16:49:33 -08:00
GCC Administrator 3adf509fe6 Daily bump. 2022-02-10 00:16:27 +00:00
David Malcolm 91b27d984c analyzer: more uninit test coverage
In addition to other test coverage, this adds the examples from
  https://cwe.mitre.org/data/definitions/457.html
(aka "CWE-457: Use of Uninitialized Variable")

For reference, the output from -fanalyzer looks like this
(after stripping away the DejaGnu directives):

uninit-CWE-457-examples.c: In function 'example_2_bad_code':
uninit-CWE-457-examples.c:56:3: warning: use of uninitialized value 'bN' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
   56 |   repaint(aN, bN); /* { dg-warning "use of uninitialized value 'bN'" } */
      |   ^~~~~~~~~~~~~~~
  'example_2_bad_code': events 1-4
    |
    |   34 |   int aN, bN;
    |      |           ^~
    |      |           |
    |      |           (1) region created on stack here
    |   35 |   switch (ctl) {
    |      |   ~~~~~~
    |      |   |
    |      |   (2) following 'default:' branch...
    |......
    |   51 |   default:
    |      |   ~~~~~~~
    |      |   |
    |      |   (3) ...to here
    |......
    |   56 |   repaint(aN, bN);
    |      |   ~~~~~~~~~~~~~~~
    |      |   |
    |      |   (4) use of uninitialized value 'bN' here
    |
uninit-CWE-457-examples.c: In function 'example_3_bad_code':
uninit-CWE-457-examples.c:95:3: warning: use of uninitialized value 'test_string' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
   95 |   printf("%s", test_string);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~
  'example_3_bad_code': events 1-4
    |
    |   90 |   char *test_string;
    |      |         ^~~~~~~~~~~
    |      |         |
    |      |         (1) region created on stack here
    |   91 |   if (i != err_val)
    |      |      ~
    |      |      |
    |      |      (2) following 'false' branch (when 'i == err_val')...
    |......
    |   95 |   printf("%s", test_string);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) ...to here
    |      |   (4) use of uninitialized value 'test_string' here
    |

gcc/testsuite/ChangeLog:
	* gcc.dg/analyzer/uninit-1.c: Add test coverage for shifts,
	comparisons, +, -, *, /, and __builtin_strlen.
	* gcc.dg/analyzer/uninit-CWE-457-examples.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-02-09 17:37:39 -05:00
Ian Lance Taylor e50a79552d compiler: don't warn for print()
We used to warn for calls to print(), because it doesn't do anything.
However, a Go 1.18 test uses that call, and it is valid Go.  Change
the compiler to just accept it and compile it; this will produce calls
to printlock and printunlock, and nothing else.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384355
2022-02-09 14:15:41 -08:00
Ian Lance Taylor 2e2b861e89 compiler: use nil pointer for zero length string constant
We used to pointlessly set the pointer of a zero length string
constant to point to a zero byte constant.  Instead, just use nil.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384354
2022-02-09 14:13:33 -08:00
Ian Lance Taylor 70feb6839f compiler: treat notinheap types as not being pointers
By definition, a type is marked notinheap doesn't contain any pointers
that the garbage collector cares about, and neither does a pointer to
such a type.  Change the type descriptors to consistently treat such
types as not being pointers, by setting ptrdata to 0 and gcdata to nil.

Change-Id: Id8466555ec493456ff5ff09f1670551414619bd2
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384118
Trust: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-02-09 14:11:11 -08:00
Harald Anlauf f3ffea93ef Fortran: try simplifications during reductions of array constructors
gcc/fortran/ChangeLog:

	PR fortran/66193
	* arith.cc (reduce_binary_ac): When reducing binary expressions,
	try simplification.  Handle case of empty constructor.
	(reduce_binary_ca): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/66193
	* gfortran.dg/array_constructor_55.f90: New test.
2022-02-09 22:13:53 +01:00
Ian Lance Taylor f6ff6738fa gccgo: link static libgo against -lrt on GNU/Linux
The upcoming Go 1.18 release requires linking against -lrt on GNU/Linux
(only) in order to call timer_create and friends.

Also change gotools to link the runtime test against -lrt.

	* gospec.cc (RTLIB, RT_LIBRARY): Define.
	(lang_specific_driver): Add -lrt if linking statically on
	GNU/Linux.

	* configure.ac (RT_LIBS): Define.
	* Makefile.am (check-runtime): Set GOLIBS to $(RT_LIBS).
	* configure, Makefile.in: Regenerate.
2022-02-09 13:13:17 -08:00
Thomas Rodgers 4cf3c33981 libstdc++: Fix deadlock in atomic wait [PR104442]
This issue was observed as a deadlock in
29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
"laundered" (e.g. type T* does not suffice as a waitable address for the
platform's native waiting primitive), the address waited is that of the
_M_ver member of __waiter_pool_base, so several threads may wait on the
same address for unrelated atomic<T> objects. As noted in the PR, the
implementation correctly exits the wait for the thread whose data
changed, but not for any other threads waiting on the same address.

As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
but the other waiters were not reloading the value of _M_ver before
re-entering the wait.

Moving the spin call inside the loop accomplishes this, and is
consistent with the predicate accepting version of __waiter::_M_do_wait.

libstdc++-v3/ChangeLog:

	PR libstdc++/104442
	* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): Move spin
	 loop inside do loop so that threads failing the wait, reload
	 _M_ver.
2022-02-09 12:30:51 -08:00
David Edelsohn f0caa45aa7 testsuite: AIX fixes
gcc/testsuite/ChangeLog:

	* gcc.dg/Wstringop-overflow-69.c: Add -Wno-psabi.
	* gcc.dg/loop-unswitch-6.c: Omit -fcompare-debug on AIX.
2022-02-09 15:03:53 -05:00
H.J. Lu 354349e7d5 x86: Compile PR target/104441 tests with -march=x86-64
Compile PR target/104441 tests with -march=x86-64 to fix test failures
when GCC is configured with --with-arch=native --with-cpu=native.

	PR target/104441
	* gcc.target/i386/pr104441-1a.c: Compile with -march=x86-64.
	* gcc.target/i386/pr104441-1b.c: Likewise.
2022-02-09 11:53:40 -08:00
Jakub Jelinek 499f8d4c2b c: Fix up __builtin_assoc_barrier handling in the C FE [PR104427]
The following testcase ICEs, because when creating PAREN_EXPR for
__builtin_assoc_barrier the FE doesn't do the usual tweaks for
EXCESS_PRECISION_EXPR or C_MAYBE_CONST_EXPR.  I believe that the
declared effect of the builtin is just association barrier, so
e.g. excess precision should be still handled like if it wasn't
there.

The following patch uses build_unary_op to handle those.

2022-02-09  Jakub Jelinek  <jakub@redhat.com>

	PR c/104427
	* c-parser.cc (c_parser_postfix_expression)
	<case RID_BUILTIN_ASSOC_BARRIER>: Use parser_build_unary_op
	instead of build1_loc to build PAREN_EXPR.
	* c-typeck.cc (build_unary_op): Handle PAREN_EXPR.
	* c-fold.cc (c_fully_fold_internal): Likewise.

	* gcc.dg/pr104427.c: New test.
2022-02-09 20:46:10 +01:00
Uros Bizjak 2f9ab267e7 i386: -mno-xsave should disable all relevant ISA flags [PR104462]
2022-02-09  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

	PR target/104462
	* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_XSAVE_UNSET):
	Also include OPTION_MASK_ISA2_AVX2_UNSET.

gcc/testsuite/ChangeLog:

	PR target/104462
	* gcc.target/i386/pr104462.c: New test.
2022-02-09 20:19:45 +01:00
Uros Bizjak 2b399dbabd i386: Force inputs to a register to avoid lowpart_subreg failure [PR104458]
Input operands can be in the form of:

	(subreg:DI (reg:V2SF 96) 0)

which chokes lowpart_subreg. Force inputs to a register, which is
preferable even when the input operand is from memory.

2022-02-09  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

	PR target/104458
	* config/i386/i386-expand.cc (ix86_split_idivmod):
	Force operands[2] and operands[3] into a register..

gcc/testsuite/ChangeLog:

	PR target/104458
	* gcc.target/i386/pr104458.c: New test.
2022-02-09 20:18:48 +01:00
Jeff Law eefec38c99 Avoid using predefined insn name for instruction with different semantics
This isn't technically a regression, but it only impacts the v850 target and
fixes a long standing code correctness issue.

As outlined in slightly more detail in the PR, the v850 is using the pattern
name "fnmasf4" and "fnmssf4" to generate fnmaf.s and fnmsf.s instructions
 respectively.

Unfortunately fnmasf4 is expected to produce (-a * b) + c and
fnmssf4 (-a * b) - c.  Those v850 instructions actually negate the entire
result.

The fix is trivial.  Use a different pattern name so that the combiner can
still generate those instructions, but prevent those instructions from being
used to implement GCC's notion of what fnmas and fnmss should be.

This fixes pr97040 as well as a handful of testsuite failures for the v3e5
multilib.

gcc/
	PR target/97040
	* config/v850/v850.md (*v850_fnmasf4): Renamed from fnmasf4.
	(*v850_fnmssf4): Renamed from fnmssf4
2022-02-09 14:10:53 -05:00
Ian Lance Taylor d3f3ec5a55 -fgo-dump-spec: really name alignment field "_"
* godump.cc (go_force_record_alignment): Really name the alignment
	field "_" (complete 2021-12-29 change).

	* gcc.misc-tests/godump-1.c: Adjust for alignment field rename.
2022-02-09 09:39:03 -08:00
Bill Schmidt ed3fea09b1 rs6000: Correct function prototypes for vec_replace_unaligned
Due to a pasto error in the documentation, vec_replace_unaligned was
implemented with the same function prototypes as vec_replace_elt.  It was
intended that vec_replace_unaligned always specify output vectors as having
type vector unsigned char, to emphasize that elements are potentially
misaligned by this built-in function.  This patch corrects the
misimplementation.

2022-02-04  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-builtins.def (VREPLACE_UN_UV2DI): Change
	function prototype.
	(VREPLACE_UN_UV4SI): Likewise.
	(VREPLACE_UN_V2DF): Likewise.
	(VREPLACE_UN_V2DI): Likewise.
	(VREPLACE_UN_V4SF): Likewise.
	(VREPLACE_UN_V4SI): Likewise.
	* config/rs6000/rs6000-overload.def (VEC_REPLACE_UN): Change all
	function prototypes.
	* config/rs6000/vsx.md (vreplace_un_<mode>): Remove define_expand.
	(vreplace_un_<mode>): New define_insn.

gcc/testsuite/
	* gcc.target/powerpc/vec-replace-word-runnable.c: Handle expected
	prototypes for each call to vec_replace_unaligned.
2022-02-09 11:10:47 -06:00
Richard Sandiford 83d7e720cd aarch64: Extend vec_concat patterns to 8-byte vectors
This patch extends the previous support for 16-byte vec_concat
so that it supports pairs of 4-byte elements.  This too isn't
strictly a regression fix, since the 8-byte forms weren't affected
by the same problems as the 16-byte forms, but it leaves things in
a more consistent state.

gcc/
	* config/aarch64/iterators.md (VDCSIF): New mode iterator.
	(VDBL): Handle SF.
	(single_wx, single_type, single_dtype, dblq): New mode attributes.
	* config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): Extend
	from VDC to VDCSIF.
	(store_pair_lanes<mode>): Likewise.
	(*aarch64_combine_internal<mode>): Likewise.
	(*aarch64_combine_internal_be<mode>): Likewise.
	(*aarch64_combinez<mode>): Likewise.
	(*aarch64_combinez_be<mode>): Likewise.
	* config/aarch64/aarch64.cc (aarch64_classify_address): Handle
	8-byte modes for ADDR_QUERY_LDP_STP_N.
	(aarch64_print_operand): Likewise for %y.

gcc/testsuite/
	* gcc.target/aarch64/vec-init-13.c: New test.
	* gcc.target/aarch64/vec-init-14.c: Likewise.
	* gcc.target/aarch64/vec-init-15.c: Likewise.
	* gcc.target/aarch64/vec-init-16.c: Likewise.
	* gcc.target/aarch64/vec-init-17.c: Likewise.
2022-02-09 16:57:06 +00:00
Richard Sandiford bce43c0493 aarch64: Remove move_lo/hi_quad expanders
This patch is the second of two to remove the old
move_lo/hi_quad expanders and move_hi_quad insns.

gcc/
	* config/aarch64/aarch64-simd.md (@aarch64_split_simd_mov<mode>):
	Use aarch64_combine instead of move_lo/hi_quad.  Tabify.
	(move_lo_quad_<mode>, aarch64_simd_move_hi_quad_<mode>): Delete.
	(aarch64_simd_move_hi_quad_be_<mode>, move_hi_quad_<mode>): Delete.
	(vec_pack_trunc_<mode>): Take general_operand elements and use
	aarch64_combine rather than move_lo/hi_quad to combine them.
	(vec_pack_trunc_df): Likewise.
2022-02-09 16:57:06 +00:00
Richard Sandiford 4057266ce5 aarch64: Add a general vec_concat expander
After previous patches, we have a (mostly new) group of vec_concat
patterns as well as vestiges of the old move_lo/hi_quad patterns.
(A previous patch removed the move_lo_quad insns, but we still
have the move_hi_quad insns and both sets of expanders.)

This patch is the first of two to remove the old move_lo/hi_quad
stuff.  It isn't technically a regression fix, but it seemed
better to make the changes now rather than leave things in
a half-finished and inconsistent state.

This patch defines an aarch64_vec_concat expander that coerces the
element operands into a valid form, including the ones added by the
previous patch.  This in turn lets us get rid of one move_lo/hi_quad
pair.

As a side-effect, it also means that vcombines of 2 vectors make
better use of the available forms, like vec_inits of 2 scalars
already do.

gcc/
	* config/aarch64/aarch64-protos.h (aarch64_split_simd_combine):
	Delete.
	* config/aarch64/aarch64-simd.md (@aarch64_combinez<mode>): Rename
	to...
	(*aarch64_combinez<mode>): ...this.
	(@aarch64_combinez_be<mode>): Rename to...
	(*aarch64_combinez_be<mode>): ...this.
	(@aarch64_vec_concat<mode>): New expander.
	(aarch64_combine<mode>): Use it.
	(@aarch64_simd_combine<mode>): Delete.
	* config/aarch64/aarch64.cc (aarch64_split_simd_combine): Delete.
	(aarch64_expand_vector_init): Use aarch64_vec_concat.

gcc/testsuite/
	* gcc.target/aarch64/vec-init-12.c: New test.
2022-02-09 16:57:05 +00:00
Richard Sandiford 85ac2fe44f aarch64: Add more vec_combine patterns
vec_combine is really one instruction on aarch64, provided that
the lowpart element is in the same register as the destination
vector.  This patch adds patterns for that.

The patch fixes a regression from GCC 8.  Before the patch:

int64x2_t s64q_1(int64_t a0, int64_t a1) {
  if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
    return (int64x2_t) { a1, a0 };
  else
    return (int64x2_t) { a0, a1 };
}

generated:

        fmov    d0, x0
        ins     v0.d[1], x1
        ins     v0.d[1], x1
        ret

whereas GCC 8 generated the more respectable:

        dup     v0.2d, x0
        ins     v0.d[1], x1
        ret

gcc/
	* config/aarch64/predicates.md (aarch64_reg_or_mem_pair_operand):
	New predicate.
	* config/aarch64/aarch64-simd.md (*aarch64_combine_internal<mode>)
	(*aarch64_combine_internal_be<mode>): New patterns.

gcc/testsuite/
	* gcc.target/aarch64/vec-init-9.c: New test.
	* gcc.target/aarch64/vec-init-10.c: Likewise.
	* gcc.target/aarch64/vec-init-11.c: Likewise.
2022-02-09 16:57:05 +00:00
Richard Sandiford aeef5c57f1 aarch64: Remove redundant vec_concat patterns
move_lo_quad_internal_<mode> and move_lo_quad_internal_be_<mode>
partially duplicate the later aarch64_combinez{,_be}<mode> patterns.
The duplication itself is a regression.

The only substantive differences between the two are:

* combinez uses vector MOV (ORR) instead of element MOV (DUP).
  The former seems more likely to be handled via renaming.

* combinez disparages the GPR->FPR alternative whereas move_lo_quad
  gave it equal cost.  The new test gives a token example of when
  the combinez behaviour helps.

gcc/
	* config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>)
	(move_lo_quad_internal_be_<mode>): Delete.
	(move_lo_quad_<mode>): Use aarch64_combine<Vhalf> instead of the above.

gcc/testsuite/
	* gcc.target/aarch64/vec-init-8.c: New test.
2022-02-09 16:57:04 +00:00
Richard Sandiford 958448a944 aarch64: Generalise adjacency check for load_pair_lanes
This patch generalises the load_pair_lanes<mode> guard so that
it uses aarch64_check_consecutive_mems to check for consecutive
mems.  It also allows the pattern to be used for STRICT_ALIGNMENT
targets if the alignment is high enough.

The main aim is to avoid an inline test, for the sake of a later patch
that needs to repeat it.  Reusing aarch64_check_consecutive_mems seemed
simpler than writing an entirely new function.

gcc/
	* config/aarch64/aarch64-protos.h (aarch64_mergeable_load_pair_p):
	Declare.
	* config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): Use
	aarch64_mergeable_load_pair_p instead of inline check.
	* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Likewise.
	(aarch64_check_consecutive_mems): Allow the reversed parameter
	to be null.
	(aarch64_mergeable_load_pair_p): New function.
2022-02-09 16:57:03 +00:00
Richard Sandiford fabc5d9bce aarch64: Generalise vec_set predicate
The aarch64_simd_vec_set<mode> define_insn takes memory operands,
so this patch makes the vec_set<mode> optab expander do the same.

gcc/
	* config/aarch64/aarch64-simd.md (vec_set<mode>): Allow the
	element to be an aarch64_simd_nonimmediate_operand.
2022-02-09 16:57:02 +00:00