191912 Commits

Author SHA1 Message Date
Ian Lance Taylor
3d54f1ffaf libgo: update README.gcc
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/387514
2022-02-22 15:34:46 -08:00
Paul A. Clarke
96ee5ce5f8 rs6000: Move g++.dg/ext powerpc tests to g++.target
Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
longer required.

2021-02-22  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* g++.dg/ext/altivec-1.C: Move to g++.target/powerpc, adjust dg
	directives.
	* g++.dg/ext/altivec-2.C: Likewise.
	* g++.dg/ext/altivec-3.C: Likewise.
	* g++.dg/ext/altivec-4.C: Likewise.
	* g++.dg/ext/altivec-5.C: Likewise.
	* g++.dg/ext/altivec-6.C: Likewise.
	* g++.dg/ext/altivec-7.C: Likewise.
	* g++.dg/ext/altivec-8.C: Likewise.
	* g++.dg/ext/altivec-9.C: Likewise.
	* g++.dg/ext/altivec-10.C: Likewise.
	* g++.dg/ext/altivec-11.C: Likewise.
	* g++.dg/ext/altivec-12.C: Likewise.
	* g++.dg/ext/altivec-13.C: Likewise.
	* g++.dg/ext/altivec-14.C: Likewise.
	* g++.dg/ext/altivec-15.C: Likewise.
	* g++.dg/ext/altivec-16.C: Likewise.
	* g++.dg/ext/altivec-17.C: Likewise.
	* g++.dg/ext/altivec-18.C: Likewise.
	* g++.dg/ext/altivec-cell-1.C: Likewise.
	* g++.dg/ext/altivec-cell-2.C: Likewise.
	* g++.dg/ext/altivec-cell-3.C: Likewise.
	* g++.dg/ext/altivec-cell-4.C: Likewise.
	* g++.dg/ext/altivec-cell-5.C: Likewise.
	* g++.dg/ext/altivec-types-1.C: Likewise.
	* g++.dg/ext/altivec-types-2.C: Likewise.
	* g++.dg/ext/altivec-types-3.C: Likewise.
	* g++.dg/ext/altivec-types-4.C: Likewise.
	* g++.dg/ext/undef-bool-1.C: Likewise.
2022-02-22 17:26:15 -06:00
Harald Anlauf
bc66b471d1 Fortran: skip compile-time shape check if constructor shape is not known
gcc/fortran/ChangeLog:

	PR fortran/104619
	* resolve.cc (resolve_structure_cons): Skip shape check if shape
	of constructor cannot be determined at compile time.

gcc/testsuite/ChangeLog:

	PR fortran/104619
	* gfortran.dg/derived_constructor_comps_7.f90: New test.
2022-02-22 21:34:58 +01:00
Roger Sayle
9d1796d82d Restore bootstrap on x86_64-pc-linux-gnu
This patch resolves the bootstrap failure on x86_64-pc-linux-gnu.

2022-02-22  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop): Restore
	bootstrap.
2022-02-22 18:17:24 +00:00
Thomas Schwinge
54f7450232 Get rid of 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref'
Clean-up for commit e2a58ed6dc5293602d0d168475109caa81ad0f0d
"openacc: Middle-end worker-partitioning support":
as of commit 2a3f9f6532bb21d8ab6f16fbe9ee603f6b1405f2
"openacc: Shared memory layout optimisation", we're no longer
running into the vectorizer ICEs for '!ADDR_SPACE_GENERIC_P'.

	gcc/
	* omp-low.cc (omp_build_component_ref): Move function...
	* omp-general.cc (omp_build_component_ref): ... here.  Remove
	'static'.
	* omp-general.h (omp_build_component_ref): Declare function.
	* omp-oacc-neuter-broadcast.cc (oacc_build_component_ref): Remove
	function.
	(build_receiver_ref, build_sender_ref): Call
	'omp_build_component_ref' instead.
2022-02-22 17:53:10 +01:00
Thomas Schwinge
0fe9176f41 Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t'
Now that I've resolved GCC 'hash_map' issues (a while ago already), we may
further simplify this after commit 049eda8274b7394523238b17ab12c3e2889f253e
"Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'": as
'hash_map' Value, directly store 'field_map_t' objects, not pointers to
manually allocated 'field_map_t' objects.

	gcc/
	* omp-oacc-neuter-broadcast.cc (record_field_map_t): Further
	simplify.  Adjust all users.
2022-02-22 17:43:39 +01:00
Thomas Schwinge
f8187b5c0d Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90'
This was a latent problem, and this commit here now resolves a regression that
after recent commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed
"amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a
GCN offloading '-march=gfx908' system:

    {+WARNING: program timed out.+}
    [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test

Same for other optimization levels.

Make sure that we're not executing non-parallelized code in gang-redundant
mode, by putting these parts into their own 'parallel' constructs, which then
default to 'num_gangs(1)'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC
	gang-redundant execution.
2022-02-22 17:32:03 +01:00
Segher Boessenkool
537c965880 rs6000: Fix GC on rs6000.c decls for atomic handling (PR88134)
In PR88134 it is pointed out that we do not have GTY markup for some
variables we use for atomic.  So, let's add that.

2022-02-22  Segher Boessenkool  <segher@kernel.crashing.org>

	PR target/88134
	* config/rs6000/rs6000.cc (atomic_hold_decl, atomic_clear_decl,
	atomic_update_decl): Add GTY markup.
2022-02-22 16:20:23 +00:00
Christophe Lyon
e9f8443a91 arm: Add VPR_REG to ALL_REGS
VPR_REG should be part of ALL_REGS, this patch fixes this omission.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.
2022-02-22 15:55:09 +00:00
Christophe Lyon
c6b4ea7ab1 arm: Convert more MVE/CDE builtins to predicate qualifiers
This patch covers a few non-load/store builtins where we do not use
the <mode> iterator and thus we cannot use <MVE_vpred>.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.cc (CX_UNARY_UNONE_QUALIFIERS): Use
	predicate.
	(CX_BINARY_UNONE_QUALIFIERS): Likewise.
	(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
	(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
	(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
	* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
	* config/arm/mve.md: Use VxBI instead of HI.
2022-02-22 15:55:09 +00:00
Christophe Lyon
6a7c13a0cf arm: Convert more load/store MVE builtins to predicate qualifiers
This patch covers a few builtins where we do not use the <mode>
iterator and thus we cannot use <MVE_vpred>.

For v2di instructions, we keep the HI mode for predicates.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.cc (STRSBS_P_QUALIFIERS): Use predicate
	qualifier.
	(STRSBU_P_QUALIFIERS): Likewise.
	(LDRGBS_Z_QUALIFIERS): Likewise.
	(LDRGBU_Z_QUALIFIERS): Likewise.
	(LDRGBWBXU_Z_QUALIFIERS): Likewise.
	(LDRGBWBS_Z_QUALIFIERS): Likewise.
	(LDRGBWBU_Z_QUALIFIERS): Likewise.
	(STRSBWBS_P_QUALIFIERS): Likewise.
	(STRSBWBU_P_QUALIFIERS): Likewise.
	* config/arm/mve.md: Use VxBI instead of HI.
2022-02-22 15:55:09 +00:00
Christophe Lyon
724d6566cd arm: Convert more MVE builtins to predicate qualifiers
This patch covers all builtins that have an HI operand and use the
<mode> iterator, thus we can replace HI whe <MVE_vpred>.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.cc (TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
	(TERNOP_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
	(TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
	(TERNOP_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
	(TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
	(TERNOP_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
	(TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Change to ...
	(TERNOP_NONE_NONE_UNONE_PRED_QUALIFIERS): ... this.
	(QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
	(QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS): ... this.
	(QUADOP_NONE_NONE_NONE_NONE_PRED_QUALIFIERS): New.
	(QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS): ... this.
	(QUADOP_UNONE_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
	(QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUADOP_UNONE_UNONE_NONE_IMM_PRED_QUALIFIERS): ... this.
	(QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUADOP_NONE_NONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
	(QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUADOP_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
	(QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ...
	(QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this.
	(STRS_P_QUALIFIERS): Use predicate qualifier.
	(STRU_P_QUALIFIERS): Likewise.
	(STRSU_P_QUALIFIERS): Likewise.
	(STRSS_P_QUALIFIERS): Likewise.
	(LDRGS_Z_QUALIFIERS): Likewise.
	(LDRGU_Z_QUALIFIERS): Likewise.
	(LDRS_Z_QUALIFIERS): Likewise.
	(LDRU_Z_QUALIFIERS): Likewise.
	(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ...
	(QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this.
	(BINOP_NONE_NONE_PRED_QUALIFIERS): New.
	(BINOP_UNONE_UNONE_PRED_QUALIFIERS): New.
	* config/arm/arm_mve_builtins.def: Use new predicated qualifiers.
	* config/arm/mve.md: Use MVE_VPRED instead of HI.
2022-02-22 15:55:08 +00:00
Christophe Lyon
e6a4aefce8 arm: Convert remaining MVE vcmp builtins to predicate qualifiers
This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.cc (BINOP_UNONE_NONE_NONE_QUALIFIERS):
	Delete.
	(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
	(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
	(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
	* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
	predicated qualifiers.
	* config/arm/mve.md (mve_vcmp<mve_cmp_op>q_n_<mode>)
	(mve_vcmp*q_m_f<mode>): Use MVE_VPRED instead of HI.
2022-02-22 15:55:08 +00:00
Christophe Lyon
df0e57c2c0 arm: Fix vcond_mask expander for MVE (PR target/100757)
The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp<mode><MVE_vpred>,
vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can
move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and
vcond_mask_<mode><v_cmp_result> back to neon.md since they are not
used by MVE anymore.  The new *<MVE_vpred> patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(!<Is_float_mode> || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
    if (a[b] != 2.0f)
      c = 5.0f;
  return c;
}

fn1:
	ldr     r3, .L3+48
	vldr.64 d4, .L3              // q2=(2.0,2.0,2.0,2.0)
	vldr.64 d5, .L3+8
	vldrw.32        q0, [r3]     // q0=a(0..3)
	adds    r3, r3, #16
	vcmp.f32        eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
	vldrw.32        q1, [r3]     // q1=a(4..7)
	vmrs     r3, P0
	vcmp.f32        eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
	vmrs    r2, P0  @ movhi
	ands    r3, r3, r2           // r3=select(a(0..3]) & select(a(4..7))
	vldr.64 d4, .L3+16           // q2=(5.0,5.0,5.0,5.0)
	vldr.64 d5, .L3+24
	vmsr     P0, r3
	vldr.64 d6, .L3+32           // q3=(4.0,4.0,4.0,4.0)
	vldr.64 d7, .L3+40
	vpsel q3, q3, q2             // q3=vcond_mask(4.0,5.0)
	vmov.32 r2, q3[1]            // keep the scalar max
	vmov.32 r0, q3[3]
	vmov.32 r3, q3[2]
	vmov.f32        s11, s12
	vmov    s15, r2
	vmov    s14, r3
	vmaxnm.f32      s15, s11, s15
	vmaxnm.f32      s15, s15, s14
	vmov    s14, r0
	vmaxnm.f32      s15, s15, s14
	vmov    r0, s15
	bx      lr
	.L4:
	.align  3
	.L3:
	.word   1073741824	// 2.0f
	.word   1073741824
	.word   1073741824
	.word   1073741824
	.word   1084227584	// 5.0f
	.word   1084227584
	.word   1084227584
	.word   1084227584
	.word   1082130432	// 4.0f
	.word   1082130432
	.word   1082130432
	.word   1082130432

This patch adds tests that trigger an ICE without this fix.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	PR target/100757
	gcc/
	* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
	(arm_expand_vector_compare): Update prototype.
	* config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New.
	(arm_vector_mode_supported_p): Add support for VxBI modes.
	(arm_expand_vector_compare): Remove useless generation of vpsel.
	(arm_expand_vcond): Fix select operands.
	(arm_get_mask_mode): New.
	* config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New.
	(vec_cmpu<mode><MVE_vpred>): New.
	(vcond_mask_<mode><MVE_vpred>): New.
	* config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ...
	* config/arm/neon.md (vec_cmp<mode><v_cmp_result>)
	(vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here
	and disable for MVE.
	* doc/sourcebuild.texi (arm_mve): Document new effective-target.

	gcc/testsuite/
	PR target/100757
	* gcc.target/arm/simd/pr100757-2.c: New.
	* gcc.target/arm/simd/pr100757-3.c: New.
	* gcc.target/arm/simd/pr100757-4.c: New.
	* gcc.target/arm/simd/pr100757.c: New.
	* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
	* lib/target-supports.exp (check_effective_target_arm_mve): New.
2022-02-22 15:55:07 +00:00
Christophe Lyon
91224cf625 arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates
We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns
to use the new MVE_7_HI iterator which covers HI and the new VxBI
modes, in conjunction with the new DB constraint for a constant vector
of booleans.

This patch also adds tests derived from the one provided in PR
target/101325: there is a compile-only test because I did not have
access to anything that could execute MVE code until recently.  I have
been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon <christophe.lyon@arm.com>
	    Richard Sandiford  <richard.sandiford@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/arm/arm-builtins.cc (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
	(BINOP_PRED_NONE_NONE_QUALIFIERS)
	(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
	(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
	* config/arm/arm-protos.h (mve_bool_vec_to_const): New.
	* config/arm/arm.cc (arm_hard_regno_mode_ok): Handle new VxBI
	modes.
	(arm_mode_to_pred_mode): New.
	(arm_expand_vector_compare): Use the right VxBI mode instead of
	HI.
	(arm_expand_vcond): Likewise.
	(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
	(mve_bool_vec_to_const): New.
	(neon_make_constant): Call mve_bool_vec_to_const when needed.
	* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
	(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
	(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
	(vpselq_s, vpselq_f): Use new predicated qualifiers.
	* config/arm/constraints.md (DB): New.
	* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
	(MVE_VPRED, MVE_vpred): New attribute iterators.
	* config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>)
	(@mve_vcmp<mve_cmp_op>q_f<mode>, @mve_vpselq_<supf><mode>)
	(@mve_vpselq_f<mode>): Use MVE_VPRED instead of HI.
	(@mve_vpselq_<supf>v2di): Define separately.
	(mov<mode>): New expander for VxBI modes.
	* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use
	MVE_7_HI iterator and add support for DB constraint.

	gcc/testsuite/
	PR target/100757
	PR target/101325
	* gcc.dg/rtl/arm/mve-vxbi.c: New test.
	* gcc.target/arm/simd/pr101325.c: New.
	* gcc.target/arm/simd/pr101325-2.c: New.
	* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
	add_options_for_arm_v8_1m_mve_fp.
2022-02-22 15:55:07 +00:00
Christophe Lyon
884f77b422 arm: Implement MVE predicates as vectors of booleans
This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b1111'.  This patch updates the aarch64 definition of VNx*BI as
needed.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>
	    Richard Sandiford  <richard.sandiford@arm.com>

	gcc/
	PR target/100757
	PR target/101325
	* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
	VNx2BI): Update definition.
	* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new
	simd types.
	(arm_init_builtin): Map predicate vectors arguments to HImode.
	(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
	rtx. Move return value to HImode rtx.
	* config/arm/arm-builtins.h (arm_type_qualifiers): Add qualifier_predicate.
	* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes.
	* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
	Pred2x8_t,Pred4x4_t): New.
	* emit-rtl.cc (init_emit_once): Handle all boolean modes.
	* genmodes.cc (mode_data): Add boolean field.
	(blank_mode): Initialize it.
	(make_complex_modes): Fix handling of boolean modes.
	(make_vector_modes): Likewise.
	(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
	(make_vector_bool_mode): Likewise.
	(BOOL_MODE): New.
	(make_bool_mode): New.
	(emit_insn_modes_h): Fix generation of boolean modes.
	(emit_class_narrowest_mode): Likewise.
	* machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT
	parameter.  Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to
	define BImode.
	* rtx-vector-builder.cc (rtx_vector_builder::find_cached_value):
	Fix handling of constm1_rtx for VECTOR_BOOL.
	* simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL.
	(native_decode_vector_rtx): Likewise.
	(test_vector_ops_duplicate): Skip vec_merge test
	with vectors of booleans.
	* varasm.cc (output_constant_pool_2): Likewise.
2022-02-22 15:55:07 +00:00
Christophe Lyon
0d0aaea105 arm: Fix mve_vmvnq_n_<supf><mode> argument mode
The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
<V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode
	for operand 1.
2022-02-22 15:55:06 +00:00
Christophe Lyon
6769084fdf arm: Add support for VPR_REG in arm_class_likely_spilled_p
VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p.  No test fails without this patch, but
it seems it should be implemented.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm.cc (arm_class_likely_spilled_p): Handle VPR_REG.
2022-02-22 15:55:06 +00:00
Christophe Lyon
bf3e36fbf1 arm: Add GENERAL_AND_VPR_REGS regclass
At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
-mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  <christophe.lyon@arm.com>

	gcc/
	* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
	(REG_CLASS_NAMES): Likewise.
	(REG_CLASS_CONTENTS): Likewise.
	(CLASS_MAX_NREGS): Handle VPR.
	* config/arm/arm.cc (arm_hard_regno_nregs): Handle VPR.
2022-02-22 15:55:06 +00:00
Christophe Lyon
7b1cce9273 arm: Add new tests for comparison vectorization with Neon and MVE
This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.

mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon <christophe.lyon@arm.com>

	gcc/testsuite/
	* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
	* gcc.target/arm/simd/neon-compare-1.c: New.
	* gcc.target/arm/simd/neon-compare-2.c: New.
	* gcc.target/arm/simd/neon-compare-3.c: New.
	* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
	* gcc.target/arm/simd/neon-vcmp-f16.c: New.
	* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
	* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
	* gcc.target/arm/simd/neon-vcmp-f32.c: New.
	* gcc.target/arm/simd/neon-vcmp.c: New.
2022-02-22 15:55:05 +00:00
Christophe Lyon
39c0b8f1ac MAINTAINERS: Update my email address.
* MAINTAINERS (Write After Approval): Update my e-mail address.
2022-02-22 15:55:05 +00:00
Tom de Vries
5ed77fb3ed [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
    ;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
    asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  <tdevries@suse.de>

	PR target/99555
	* config/nvptx/bar.c (generation_to_barrier): New function, copied
	from config/rtems/bar.c.
	(futex_wait, futex_wake): New function.
	(do_spin, do_wait): New function, copied from config/linux/wait.h.
	(gomp_barrier_wait_end, gomp_barrier_wait_last)
	(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
	(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
	and replace with include of config/linux/bar.c.
	* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
	(gomp_barrier_init): Init new fields.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
	workarounds.
	* testsuite/libgomp.c/pr99555-1.c: Same.
	* testsuite/libgomp.fortran/task-detach-6.f90: Same.
2022-02-22 15:48:03 +01:00
Tobias Burnus
bd73d8dd31 nvptx: Add -misa=sm_70
Add -misa=sm_70, and use it to specify the misa value in test-case
gcc.target/nvptx/atomic-store-2.c.

Tested on nvptx.

gcc/ChangeLog:

	* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Handle SM70.
	* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm):
	Likewise.
	* config/nvptx/nvptx.opt (misa): Add sm_70 alias PTX_ISA_SM70.

gcc/testsuite/ChangeLog:

2022-02-22  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/atomic-store-2.c: Use -misa=sm_70.
	* gcc.target/nvptx/uniform-simt-3.c: Same.

Co-Authored-By: Tom de Vries <tdevries@suse.de>
2022-02-22 15:38:55 +01:00
Patrick Palka
5e1b17f038 libstdc++: Implement P2415R2 changes to viewable_range / views::all
This implements the wording changes in P2415R2 "What is a view?", which
is a DR for C++20.

libstdc++-v3/ChangeLog:

	* include/bits/ranges_base.h (__detail::__is_initializer_list):
	Define.
	(viewable_range): Adjust as per P2415R2.
	* include/bits/ranges_cmp.h (__cpp_lib_ranges): Adjust value.
	* include/std/ranges (owning_view): Define as per P2415R2.
	(enable_borrowed_range<owning_view>): Likewise.
	(views::__detail::__can_subrange): Replace with ...
	(views::__detail::__can_owning_view): ... this.
	(views::_All::_S_noexcept): Sync with operator().
	(views::_All::operator()): Use owning_view instead of subrange
	as per P2415R2.
	* include/std/version (__cpp_lib_ranges): Adjust value.
	* testsuite/std/ranges/adaptors/all.cc (test06): Adjust now that
	views::all uses owning_view instead of subrange.
	(test08): New test.
	* testsuite/std/ranges/adaptors/lazy_split.cc (test09): Adjust
	now that rvalue non-view non-borrowed ranges are viewable.
	* testsuite/std/ranges/adaptors/split.cc (test06): Likewise.
2022-02-22 09:37:58 -05:00
Tobias Burnus
bc91cb8d8c nvptx: Add -mptx=6.0
Currently supported internally are 3.1, 6.0, 6.3 and 7.0.

However, -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0.

Add -mptx=6.0 for consistency.

Tested on nvptx.

gcc/ChangeLog:

	* config/nvptx/nvptx.opt (mptx): Add 6.0 alias PTX_VERSION_6_0.
	* doc/invoke.texi (-mptx): Update for new values and defaults.

Co-Authored-By: Tom de Vries <tdevries@suse.de>
2022-02-22 14:57:28 +01:00
Tom de Vries
c2b23aaaf4 [nvptx] Add -mptx-comment
Add functionality that indicates which insns are added by -minit-regs, such
that for instance we have for pr53465.s:
...
        // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
        // Start: Added by -minit-regs=3:
        // #NO_APP
                mov.u32 %r26, 0;
        // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
        // End: Added by -minit-regs=3:
        // #NO_APP
...

Can be switched off using -mno-ptx-comment.

Tested on nvptx.

gcc/ChangeLog:

2022-02-21  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.cc (gen_comment): New function.
	(workaround_uninit_method_1, workaround_uninit_method_2)
	(workaround_uninit_method_3): : Use gen_comment.
	* config/nvptx/nvptx.opt (mptx-comment): New option.
2022-02-22 14:51:59 +01:00
Richard Biener
d669237f7d Dump def that we use for a splat
This makes the SLP vectorizer dump the def we use for a splat to
aid debugging.

2022-02-22  Richard Biener  <rguenther@suse.de>

	* tree-vect-slp.cc (vect_build_slp_tree_2): Dump the def used
	for a splat.
2022-02-22 14:28:02 +01:00
Roger Sayle
2ef0e75d0b Implement constant-folding simplifications of reductions.
This patch addresses a code quality regression in GCC 12 by implementing
some constant folding/simplification transformations for REDUC_PLUS_EXPR
in match.pd.  The motivating example is gcc.dg/vect/pr89440.c which with
-O2 -ffast-math (with vectorization now enabled) gets optimized to:

float f (float x)
{
  vector(4) float vect_x_14.11;
  vector(4) float _2;
  float _32;

  _2 = {x_9(D), 0.0, 0.0, 0.0};
  vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 };
  _32 = .REDUC_PLUS (vect_x_14.11_29); [tail call]
  return _32;
}

With these proposed new transformations, we can simplify the
above code even further.

float f (float x)
{
  float _32;
  _32 = x_9(D) + 1.36e+2;
  return _32;
}

[which happens to match what we'd produce with -fno-tree-vectorize,
and with GCC 11].

2022-02-22  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	* fold-const.cc (ctor_single_nonzero_element): New function to
	return the single non-zero element of a (vector) constructor.
	* fold-const.h (ctor_single_nonzero_element): Prototype here.
	* match.pd (reduc (constructor@0)): Simplify reductions of a
	constructor containing a single non-zero element.
	(reduc (@0 op VECTOR_CST) ->  (reduc @0) op CONST): Simplify
	reductions of vector operations of the same operator with
	constant vector operands.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-reduc-1.c: New test case.
2022-02-22 12:32:22 +00:00
Jakub Jelinek
2f59f06761 libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]
On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
 #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
 #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
 #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).
2022-02-22 11:33:45 +01:00
Jakub Jelinek
d44dc131f4 ranger: Fix up REALPART_EXPR/IMAGPART_EXPR handling [PR104604]
The following testcase is miscompiled since r12-3328.
That change assumed that if rhs1 of a GIMPLE_ASSIGN is COMPLEX_CST, then
that is the value of the lhs of the stmt, but that is not the case always,
only if it is a GIMPLE_SINGLE_RHS stmt.  If it is e.g.
GIMPLE_UNARY_RHS or GIMPLE_BINARY_RHS (the latter happens in the testcase),
then it can be e.g.
__complex__ (3, 0) / var
and the REALPART_EXPR of that isn't 3, but the realpart of the division.
I assume once the ranger can do complex numbers adjust_*part_expr will just
fetch one or the other range from a underlying complex range, but until
then, we should limit this to what r12-3328 meant to do.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/104604
	* gimple-range-fold.cc (adjust_imagpart_expr, adjust_realpart_expr):
	Only check if gimple_assign_rhs1 is COMPLEX_CST if
	gimple_assign_rhs_code is COMPLEX_CST.

	* gcc.c-torture/execute/pr104604.c: New test.
2022-02-22 10:43:13 +01:00
Jakub Jelinek
7e691189ca i386: Fix up copysign/xorsign expansion [PR104612]
We ICE on the following testcase for -m32 since r12-3435. because
operands[2] is (subreg:SF (reg:DI ...) 0) and
lowpart_subreg (V4SFmode, operands[2], SFmode)
returns NULL, and that is what we use in AND etc. insns we emit.

My earlier version of the patch fixes that by calling force_reg for the
input operands, to make sure they are really REGs and so lowpart_subreg
will succeed on them - even for theoretical MEMs using REGs there seems
desirable, we don't want to read following memory slots for the paradoxical
subreg.  For the outputs, I thought we'd get better code by always computing
result into a new pseudo and them move lowpart of that pseudo into dest.

Unfortunately it regressed
FAIL: gcc.target/i386/pr89984-2.c scan-assembler-not vmovaps
on which the patch changes:
        vandps  .LC0(%rip), %xmm1, %xmm1
-       vxorps  %xmm0, %xmm1, %xmm0
+       vxorps  %xmm0, %xmm1, %xmm1
+       vmovaps %xmm1, %xmm0
        ret
The RA sees:
(insn 8 4 9 2 (set (reg:V4SF 85)
        (and:V4SF (subreg:V4SF (reg:SF 90) 0)
            (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16 A128]))) "pr89984-2.c":7:12 2838 {*andv4sf3}
     (expr_list:REG_DEAD (reg:SF 90)
        (nil)))
(insn 9 8 10 2 (set (reg:V4SF 87)
        (xor:V4SF (reg:V4SF 85)
            (subreg:V4SF (reg:SF 89) 0))) "pr89984-2.c":7:12 2842 {*xorv4sf3}
     (expr_list:REG_DEAD (reg:SF 89)
        (expr_list:REG_DEAD (reg:V4SF 85)
            (nil))))
(insn 10 9 14 2 (set (reg:SF 82 [ <retval> ])
        (subreg:SF (reg:V4SF 87) 0)) "pr89984-2.c":7:12 142 {*movsf_internal}
     (expr_list:REG_DEAD (reg:V4SF 87)
        (nil)))
(insn 14 10 15 2 (set (reg/i:SF 20 xmm0)
        (reg:SF 82 [ <retval> ])) "pr89984-2.c":8:1 142 {*movsf_internal}
     (expr_list:REG_DEAD (reg:SF 82 [ <retval> ])
        (nil)))
(insn 15 14 0 2 (use (reg/i:SF 20 xmm0)) "pr89984-2.c":8:1 -1
     (nil))
and doesn't know that if it would use xmm0 not just for pseudo 82
but also for pseudo 87, it could create a noop move in insn 10 and
so could avoid an extra register copy and nothing later on is able
to figure that out either.  I don't know how the RA should know
that though.

So that we don't regress, this version of the patch
will do this stuff (i.e. use fresh vector pseudo as destination and
then move lowpart of that to dest) over what it used before (i.e.
use paradoxical subreg of the dest) only if lowpart_subreg returns NULL.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR target/104612
	* config/i386/i386-expand.cc (ix86_expand_copysign): Call force_reg
	on input operands before calling lowpart_subreg on it.  For output
	operand, use a vmode pseudo as destination and then move its lowpart
	subreg into operands[0] if lowpart_subreg fails on dest.
	(ix86_expand_xorsign): Likewise.

	* gcc.dg/pr104612.c: New test.
2022-02-22 10:38:37 +01:00
Tom de Vries
6263b656c8 [libgomp, testsuite, nvptx] Fix pr96390.c without CUDA
When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...

The problem is that we're expecting the following ptxas error:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
ptxas /tmp/ccZYDw8N.o, line 90; error   : Call to 'baz' requires call prototype
ptxas /tmp/ccZYDw8N.o, line 90; error   : Unknown symbol 'baz'
...

But it's not triggered because ptxas is not in the path, so nvptx-none-as
defaults to --no-verify.

So instead, we run into the same error at execution time.

Fix this by forcing verification using:
...
/* { dg-additional-options "-foffload=-Wa,--verify" \
     { target offload_target_nvptx } } */
...
such that we run into the xfail in this way instead:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory
nvptx-as: ptxas returned 255 exit status
...

Tested on x86_64-linux with nvptx accelerator.

libgomp/ChangeLog:

2022-02-21  Tom de Vries  <tdevries@suse.de>

	PR testsuite/104146
	* testsuite/libgomp.c++/pr96390.C: Add additional-option
	-foffload=-Wa,--verify for nvptx.
	* testsuite/libgomp.c-c++-common/pr96390.c: Same.
2022-02-22 10:23:20 +01:00
Tom de Vries
f0ae4257e3 [nvptx] Xfail sibcall execution tests
On nvptx I see the following FAIL:
...
FAIL: gcc.dg/sibcall-3.c execution test
...

The test-case states that "this test is xfailed on targets without sibcall
patterns".

The nvptx port doesn't have a sibcall pattern, so add an xfail.  Likewise in
two similar test-cases.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2022-02-20  Tom de Vries  <tdevries@suse.de>

	* gcc.dg/sibcall-10.c: Xfail execution test for nvptx.
	* gcc.dg/sibcall-3.c: Same.
	* gcc.dg/sibcall-4.c: Same.
2022-02-22 10:14:59 +01:00
Tom de Vries
7d3e649895 [nvptx, testsuite] Remove mptx settings in gcc.target/nvptx tests
Some test-cases in gcc/testsuite/gcc.target/nvptx contain mptx
settings, which are paired with misa settings, in order to have the mptx
version support the misa version.

Since commit decde11183bd ("[nvptx] Choose -mptx default based on -misa"),
this is no longer necessary.

Remove the mptx settings.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2022-02-20  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/float16-1.c: Drop -mptx setting.
	* gcc.target/nvptx/float16-2.c: Same.
	* gcc.target/nvptx/float16-3.c: Same.
	* gcc.target/nvptx/float16-4.c: Same.
	* gcc.target/nvptx/float16-5.c: Same.
	* gcc.target/nvptx/float16-6.c: Same.
	* gcc.target/nvptx/tanh-1.c: Same.
2022-02-22 10:14:18 +01:00
Richard Biener
90d693bdc9 target/99881 - x86 vector cost of CTOR from integer regs
This uses the now passed SLP node to the vectorizer costing hook
to adjust vector construction costs for the cost of moving an
integer component from a GPR to a vector register when that's
required for building a vector from components.  A cruical difference
here is whether the component is loaded from memory or extracted
from a vector register as in those cases no intermediate GPR is involved.

The pr99881.c testcase can be Un-XFAILed with this patch, the
pr91446.c testcase now produces scalar code which looks superior
to me so I've adjusted it as well.

2022-02-18  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/104582
	PR target/99881
	* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
	Cost GPR to vector register moves for integer vector construction.

	* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: New.
	* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Likewise.
	* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Likewise.
	* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Likewise.
	* gcc.target/i386/pr99881.c: Un-XFAIL.
	* gcc.target/i386/pr91446.c: Adjust to not expect vectorization.
2022-02-22 07:48:45 +01:00
Richard Biener
f24dfc7617 tree-optimization/104582 - make SLP node available in vector cost hook
This adjusts the vectorizer costing API to allow passing down the
SLP node the vector stmt is created from.

2022-02-18  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/104582
	* tree-vectorizer.h (stmt_info_for_cost::node): New field.
	(vector_costs::add_stmt_cost): Add SLP node parameter.
	(dump_stmt_cost): Likewise.
	(add_stmt_cost): Likewise, new overload and adjust.
	(add_stmt_costs): Adjust.
	(record_stmt_cost): New overload.
	* tree-vectorizer.cc (dump_stmt_cost): Dump the SLP node.
	(vector_costs::add_stmt_cost): Adjust.
	* tree-vect-loop.cc (vect_estimate_min_profitable_iters):
	Adjust.
	* tree-vect-slp.cc (vect_prologue_cost_for_slp): Record
	the SLP node for costing.
	(vectorizable_slp_permutation): Likewise.
	* tree-vect-stmts.cc (record_stmt_cost): Adjust and add
	new overloads.
	* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
	Adjust.
	* config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost):
	Adjust.
	* config/rs6000/rs6000.cc (rs6000_vector_costs::add_stmt_cost):
	Adjust.
	(rs6000_cost_data::adjust_vect_cost_per_loop): Likewise.
2022-02-22 07:48:38 +01:00
Richard Biener
61fc5e098e tree-optimization/104582 - Simplify vectorizer cost API and fixes
This simplifies the vectorizer cost API by providing overloads
to add_stmt_cost and record_stmt_cost suitable for scalar stmt
and branch stmt costing which do not need information like
a vector type or alignment.  It also fixes two mistakes where
costs for versioning tests were recorded as vector stmt rather
than scalar stmt.

This is a first patch to simplify the actual fix for PR104582.

2022-02-18  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/104582
	* tree-vectorizer.h (add_stmt_cost): New overload.
	(record_stmt_cost): Likewise.
	* tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
	Use add_stmt_costs.
	(vect_get_known_peeling_cost): Use new overloads.
	(vect_estimate_min_profitable_iters): Likewise.  Consistently
	use scalar_stmt for costing versioning checks.
	* tree-vect-stmts.cc (record_stmt_cost): New overload.
2022-02-22 07:48:31 +01:00
Hongyu Wang
0435b978f9 i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR103069]
For cmpxchg, it is commonly used in spin loop, and several user code
such as pthread directly takes cmpxchg as loop condition, which cause
huge cache bouncing.

This patch extends previous implementation to relax all cmpxchg
instruction under -mrelax-cmpxchg-loop with an extra atomic load,
compare and emulate the failed cmpxchg behavior.

For original spin loop which looks like

loop: mov    %eax,%r8d
      or     $1,%r8d
      lock cmpxchg %r8d,(%rdi)
      jne    loop

It will now truns to

loop: mov    %eax,%r8d
      or     $1,%r8d
      mov    (%r8),%rsi <--- load lock first
      cmp    %rsi,%rax <--- compare with expected input
      jne    .L2 <--- lock ne expected
      lock cmpxchg %r8d,(%rdi)
      jne    loop
  L2: mov    %rsi,%rax <--- perform the behavior of failed cmpxchg
      jne    loop

under -mrelax-cmpxchg-loop.

gcc/ChangeLog:

	PR target/103069
	* config/i386/i386-expand.cc (ix86_expand_atomic_fetch_op_loop):
	Split atomic fetch and loop part.
	(ix86_expand_cmpxchg_loop): New expander for cmpxchg loop.
	* config/i386/i386-protos.h (ix86_expand_cmpxchg_loop): New
	prototype.
	* config/i386/sync.md (atomic_compare_and_swap<mode>): Call new
	expander under TARGET_RELAX_CMPXCHG_LOOP.
	(atomic_compare_and_swap<mode>): Likewise for doubleword modes.

gcc/testsuite/ChangeLog:

	PR target/103069
	* gcc.target/i386/pr103069-2.c: Adjust result check.
	* gcc.target/i386/pr103069-3.c: New test.
	* gcc.target/i386/pr103069-4.c: Likewise.
2022-02-22 09:30:19 +08:00
GCC Administrator
5c105adbf2 Daily bump. 2022-02-22 00:16:33 +00:00
Ian Lance Taylor
a7eeaa4898 runtime/internal/syscall: build dummy package if not Linux
Fixes libgo build on non-Linux systems.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/387134
2022-02-21 13:24:38 -08:00
Dan Li
ce09ab17dd aarch64: Add compiler support for Shadow Call Stack
Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li <ashimida@linux.alibaba.com>

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (SLOT_REQUIRED):
	Change wb_candidate[12] to wb_push_candidate[12].
	(aarch64_layout_frame): Likewise, and
	change callee_adjust when scs is enabled.
	(aarch64_save_callee_saves):
	Change wb_candidate[12] to wb_push_candidate[12].
	(aarch64_restore_callee_saves):
	Change wb_candidate[12] to wb_pop_candidate[12].
	(aarch64_get_separate_components):
	Change wb_candidate[12] to wb_push_candidate[12].
	(aarch64_expand_prologue): Push x30 onto SCS before it's
	pushed onto stack.
	(aarch64_expand_epilogue): Pop x30 frome SCS, while
	preventing it from being popped from the regular stack again.
	(aarch64_override_options_internal): Add SCS compile option check.
	(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
	* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled,
	wb_pop_candidate[12], and rename wb_candidate[12] to
	wb_push_candidate[12].
	* config/aarch64/aarch64.md (scs_push): New template.
	(scs_pop): Likewise.
	* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in: Add hook have_shadow_call_stack.
	* flag-types.h (enum sanitize_code):
	Add SANITIZE_SHADOW_CALL_STACK.
	* opts.cc (parse_sanitizer_options): Add shadow-call-stack
	and exclude SANITIZE_SHADOW_CALL_STACK.
	* target.def: New hook.
	* toplev.cc (process_options): Add SCS compile option check.
	* ubsan.cc (ubsan_expand_null_ifn): Enum type conversion.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/shadow_call_stack_1.c: New test.
	* gcc.target/aarch64/shadow_call_stack_2.c: New test.
	* gcc.target/aarch64/shadow_call_stack_3.c: New test.
	* gcc.target/aarch64/shadow_call_stack_4.c: New test.
	* gcc.target/aarch64/shadow_call_stack_5.c: New test.
	* gcc.target/aarch64/shadow_call_stack_6.c: New test.
	* gcc.target/aarch64/shadow_call_stack_7.c: New test.
	* gcc.target/aarch64/shadow_call_stack_8.c: New test.
2022-02-21 20:01:14 +00:00
Tom de Vries
02aedc6f26 [nvptx] Initialize ptx regs
With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into:
...
FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
...
while the test-cases pass with nvptx-none-run -O0.

The problem is that the generated ptx contains a read from an uninitialized
ptx register, and the driver JIT doesn't handle this well.

For -O2 and -O3, we can get rid of the FAIL using --param
logical-op-non-short-circuit=0.  But not for -O1.

At -O1, the test-case minimizes to:
...
void __attribute__((noinline, noclone))
foo (int y) {
  int c;
  for (int i = 0; i < y; i++)
    {
      int d = i + 1;
      if (i && d <= c)
        __builtin_abort ();
      c = d;
    }
}

int main () {
  foo (2); return 0;
}
...

Note that the test-case does not contain an uninitialized use.  In the first
iteration, i is 0 and consequently c is not read.  In the second iteration, c
is read, but by that time it's already initialized by 'c = d' from the first
iteration.

AFAICT the problem is introduced as follows: the conditional use of c in the
loop body is translated into an unconditional use of c in the loop header:
...
  # c_1 = PHI <c_4(D)(2), c_9(6)>
...
which forwprop1 propagates the 'c_9 = d_7' assignment into:
...
  # c_1 = PHI <c_4(D)(2), d_7(6)>
...
which ends up being translated by expand into an unconditional:
...
(insn 13 12 0 (set (reg/v:SI 22 [ c ])
        (reg/v:SI 23 [ d ])) -1
     (nil))
...
at the start of the loop body, creating an uninitialized read of d on the
path from loop entry.

By disabling coalesce_ssa_name, we get the more usual copies on the incoming
edges.  The copy on the loop entry path still does an uninitialized read, but
that one's now initialized by init-regs.  The test-case passes, also when
disabling init-regs, so it's possible that the JIT driver doesn't object to
this type of uninitialized read.

Now that we characterized the problem to some degree, we need to fix this,
because either:
- we're violating an undocumented ptx invariant, and this is a compiler bug,
  or
- this is is a driver JIT bug and we need to work around it.

There are essentially two strategies to address this:
- stop the compiler from creating uninitialized reads
- patch up uninitialized reads using additional initialization

The former will probably involve:
- making some optimizations more conservative in the presence of
  uninitialized reads, and
- disabling some other optimizations (where making them more conservative is
  not possible, or cannot easily be achieved).
This will probably will have a cost penalty for code that does not suffer from
the original problem.

The latter has the problem that it may paper over uninitialized reads
in the source code, or indeed over ones that were incorrectly introduced
by the compiler.  But it has the advantage that it allows for the problem to
be addressed at a single location.

There's an existing pass, init-regs, which implements a form of the latter,
but it doesn't work for this example because it only inserts additional
initialization for uses that have not a single reaching definition.

Fix this by adding initialization of uninitialized ptx regs in reorg.

Control the new functionality using -minit-regs=<0|1|2|3>, meaning:
- 0: disabled.
- 1: add initialization of all regs at the entry bb
- 2: add initialization of uninitialized regs at the entry bb
- 3: add initialization of uninitialized regs close to the use
and defaulting to 3.

Tested on nvptx.

gcc/ChangeLog:

2022-02-17  Tom de Vries  <tdevries@suse.de>

	PR target/104440
	* config/nvptx/nvptx.cc (workaround_uninit_method_1)
	(workaround_uninit_method_2, workaround_uninit_method_3)
	(workaround_uninit): New function.
	(nvptx_reorg): Use workaround_uninit.
	* config/nvptx/nvptx.opt (minit-regs): New option.
2022-02-21 16:49:37 +01:00
Patrick Palka
e74d764e17 c++: Add testcase for already fixed PR [PR85493]
The a1 and a2 case were fixed (by diagnosing the invalid expression)
with r11-434, and the a3 case with r8-7625.

	PR c++/85493

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/decltype80.C: New test.
2022-02-21 09:20:23 -05:00
Andre Vieira
d34cdec567 rtl-optimization/104498: Fix comparing symbol reference
gcc/ChangeLog:

	PR rtl-optimization/104498
	* alias.cc (compare_base_symbol_refs): Correct distance computation
	when swapping x and y.
2022-02-21 09:41:53 +00:00
Andrew Pinski
e01530ec1e c: [PR104506] Fix ICE after error due to change of type to error_mark_node
The problem here is we end up with an error_mark_node when calling
useless_type_conversion_p and that ICEs. STRIP_NOPS/tree_nop_conversion
has had a check for the inner type being an error_mark_node since g9a6bb3f78c96
(2000). This just adds the check also to tree_ssa_useless_type_conversion.
STRIP_USELESS_TYPE_CONVERSION is mostly used inside the gimplifier
and the places where it is used outside of the gimplifier would not
be adding too much overhead.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

	PR c/104506

gcc/ChangeLog:

	* tree-ssa.cc (tree_ssa_useless_type_conversion):
	Check the inner type before calling useless_type_conversion_p.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr104506-1.c: New test.
	* gcc.dg/pr104506-2.c: New test.
	* gcc.dg/pr104506-3.c: New test.
2022-02-21 09:05:50 +00:00
GCC Administrator
c42f1e7734 Daily bump. 2022-02-21 00:16:24 +00:00
Iain Buclaw
1d98337c6b d: Remove handling of deleting GC allocated classes.
Now that the `delete' keyword has been removed from the front-end, only
compiler-generated uses of DeleteExp reach the code generator via the
auto-destruction of `scope class' variables.

The run-time library helpers that previously were used to delete GC
class objects can now be removed from the compiler.

gcc/d/ChangeLog:

	* expr.cc (ExprVisitor::visit (DeleteExp *)): Remove handling of
	deleting GC allocated classes.
	* runtime.def (DELCLASS): Remove.
	(DELINTERFACE): Remove.
2022-02-21 00:12:01 +01:00
Iain Buclaw
6384eff56d d: Merge upstream dmd cb49e99f8, druntime 55528bd1, phobos 1a3e80ec2.
D front-end changes:

    - Import dmd v2.099.0-beta.1.
    - It's now an error to use `alias this' for partial assignment.
    - The `delete' keyword has been removed from the language.
    - Using `this' and `super' as types has been removed from the
      language, the parser no longer specially handles this wrong code
      with an informative error.

D Runtime changes:

    - Import druntime v2.099.0-beta.1.

Phobos changes:

    - Import phobos v2.099.0-beta.1.

gcc/d/ChangeLog:

	* dmd/MERGE: Merge upstream dmd cb49e99f8.
	* dmd/VERSION: Update version to v2.099.0-beta.1.
	* decl.cc (layout_class_initializer): Update call to NewExp::create.
	* expr.cc (ExprVisitor::visit (DeleteExp *)): Remove handling of
	deleting arrays and pointers.
	(ExprVisitor::visit (DotVarExp *)): Convert complex types to the
	front-end library type representing them.
	(ExprVisitor::visit (StringExp *)): Use getCodeUnit instead of charAt
	to get the value of each index in a string expression.
	* runtime.def (DELMEMORY): Remove.
	(DELARRAYT): Remove.
	* types.cc (TypeVisitor::visit (TypeEnum *)): Handle anonymous enums.

libphobos/ChangeLog:

	* libdruntime/MERGE: Merge upstream druntime 55528bd1.
	* src/MERGE: Merge upstream phobos 1a3e80ec2.
	* testsuite/libphobos.hash/test_hash.d: Update.
	* testsuite/libphobos.betterc/test19933.d: New test.
2022-02-20 23:37:32 +01:00
Harald Anlauf
e49508ac6b Fortran: improve check of pointer initialization in DATA statements
gcc/fortran/ChangeLog:

	PR fortran/77693
	* data.cc (gfc_assign_data_value): If a variable in a data
	statement has the POINTER attribute, check for allowed initial
	data target that is compatible with pointer assignment.
	* gfortran.h (IS_POINTER): New macro.

gcc/testsuite/ChangeLog:

	PR fortran/77693
	* gfortran.dg/data_pointer_2.f90: New test.
2022-02-20 22:34:21 +01:00
GCC Administrator
1f96b5eeef Daily bump. 2022-02-20 00:16:22 +00:00