Commit Graph

191632 Commits

Author SHA1 Message Date
Jason Merrill
8a37897862 c++: lambda in template default argument [PR103186]
The problem with this testcase was that since my patch for PR97900 we
weren't preserving DECL_UID identity for parameters of instantiations of
templated functions, so using those parameters as the keys for the
defarg_inst map broke.  I think this was always fragile given the
possibility of redeclarations, so instead of reverting that change let's
switch to keying off the function.

Memory use compiling stdc++.h is not noticeably different.

	PR c++/103186

gcc/cp/ChangeLog:

	* pt.cc (defarg_inst): Use tree_vec_map_cache_hasher.
	(defarg_insts_for): New.
	(tsubst_default_argument): Adjust.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/lambda/lambda-defarg10.C: New test.
2022-02-01 14:14:12 -05:00
Jason Merrill
b649071d4b tree: move tree_vec_map_cache_hasher into header
gcc/ChangeLog:

	* tree.h (struct tree_vec_map_cache_hasher): Move from...
	* tree.cc (struct tree_vec_map_cache_hasher): ...here.
2022-02-01 14:14:12 -05:00
Tom de Vries
f32f74c2e8 [nvptx] Add uniform_warp_check insn
On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
  -O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx]
Update default ptx isa to 6.3".

The problem is again that the first diverging branch is not handled as such in
SASS, which causes problems with a subsequent shfl insn, but given that we
have -mptx=3.1 we can't use the bar.warp.sync insn.

Given that the default is now -mptx=6.3, and consequently -mptx=3.1 is of a
lesser importance, implement the next best thing: abort when detecting
non-convergence using this insn:
...
  { .reg.b32 act;
    vote.ballot.b32 act,1;
    .reg.pred uni;
    setp.eq.b32 uni,act,0xffffffff;
    @ !uni trap;
    @ !uni exit;
  }
...

Interestingly, the effect of this is that rather than aborting, the test-case
now passes.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-31  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.cc (nvptx_single): Use nvptx_uniform_warp_check.
	* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
	UNSPECV_UNIFORM_WARP_CHECK.
	(define_insn "nvptx_uniform_warp_check"): New define_insn.
2022-02-01 19:29:01 +01:00
Tom de Vries
bba61d403d [nvptx] Add bar.warp.sync
On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
  -O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx] Update
default ptx isa to 6.3".

The first divergent branch looks like:
...
  {
    .reg .u32 %x;
    mov.u32 %x,%tid.x;
    setp.ne.u32 %r59,%x,0;
  }
  @ %r59 bra $L15;
  mov.u64 %r48,%ar0;
  mov.u32 %r22,2;
  ld.u64 %r53,[%r48];
  mov.u32 %r55,%r22;
  mov.u32 %r54,1;
 $L15:
...
and when inspecting the generated SASS, the branch is not setup as a divergent
branch, but instead as a regular branch.

This causes us to execute a shfl.sync insn in divergent mode, which is likely
to cause trouble given a remark in the ptx isa version 6.3, which mentions
that for .target sm_6x or below, all threads must excute the same
shfl.sync instruction in convergence.

Fix this by placing a "bar.warp.sync 0xffffffff" at the desired convergence
point (in the example above, after $L15).

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-31  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.cc (nvptx_single): Use nvptx_warpsync.
	* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
	UNSPECV_WARPSYNC.
	(define_insn "nvptx_warpsync"): New define_insn.
2022-02-01 19:28:57 +01:00
Tom de Vries
8ff0669f6d [nvptx] Update default ptx isa to 6.3
With the following example, minimized from parallel-dims.c:
...
int
main (void)
{
  int vectors_max = -1;
  #pragma acc parallel num_gangs (1) num_workers (1) copy (vectors_max)
  {
    for (int i = 0; i < 2; i++)
      for (int j = 0; j < 2; j++)
        #pragma acc loop vector reduction (max: vectors_max)
        for (int k = 0; k < 32; k++)
          vectors_max = k;
  }

  if (vectors_max != 31)
    __builtin_abort ();

  return 0;
}
...
I run into (T400, driver version 470.94):
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \
  execution test
...
The FAIL does not happen with GOMP_NVPTX_JIT=-O0.

The problem seems to be that the shfl insns for the vector reduction are not
executed uniformly by the warp.  Enforcing this by using shfl.sync fixes the
problem.

Fix this by setting the ptx isa to 6.3 by default, which allows the use of
shfl.sync.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.opt (mptx): Set to PTX_VERSION_6_3 by default.
2022-02-01 19:28:52 +01:00
Tom de Vries
57f971f992 [nvptx] Update bar.sync for ptx isa 6.0
In ptx isa 6.0, a new barrier instruction was added, and bar.sync was
redefined as barrier.sync.aligned.

The aligned modifier indicates that all threads in a CTA will execute the same
barrier instruction.

The seems fine for a form "bar.sync 0".

But a "bar.sync %rx,64" (as used for vector length > 32) may execute a
diffferent barrier depending on the value of %rx, so we can't assume it's
aligned.

Fix this by using "barrier.sync %rx,64" instead.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_6_0.
	* config/nvptx/nvptx.h (TARGET_PTX_6_0): New macro.
	* config/nvptx/nvptx.md (define_insn "nvptx_barsync"): Use barrier
	insn for TARGET_PTX_6_0.
2022-02-01 19:28:48 +01:00
Tom de Vries
456de10c54 [nvptx] Handle nop in prevent_branch_around_nothing
When running libgomp test-case reduction-7.c on an nvptx accelerator
(T400, driver version 470.86) and GOMP_NVPTX_JIT=-O0, I run into:
...
reduction-7.exe:reduction-7.c:312: v_p_2: \
  Assertion `out[j * 32 + i] == (i + j) * 2' failed.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-7.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
  -O0  execution test
...

During investigation I found ptx code like this:
...
@ %r163 bra $L262;
$L262:
...

There's a known problem with executing this type of code, and a workaround is
in place to address this: prevent_branch_around_nothing.  The workaround does
not trigger though because it doesn't handle the nop insn.

Fix this by handling the nop insn in prevent_branch_around_nothing.

Tested libgomp on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	PR target/100428
	* config/nvptx/nvptx.cc (prevent_branch_around_nothing): Handle nop
	insn.
2022-02-01 19:28:39 +01:00
Tom de Vries
e0451f93d9 [nvptx] Add some support for .local atomics
The ptx insn atom doesn't support local memory.  In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.

The message is somewhat confusing given that actually the operation is not
supported on local address space.

Fix this by falling back on a non-atomic version when detecting
a frame-related memory operand.

This only solves some cases that are detected at compile-time.  It does
however fix the openacc private-atomic-* test-cases.

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1")
	(define_insn "atomic_exchange<mode>")
	(define_insn "atomic_fetch_add<mode>")
	(define_insn "atomic_fetch_addsf")
	(define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version
	if memory operands is frame-relative.

gcc/testsuite/ChangeLog:

2022-01-31  Tom de Vries  <tdevries@suse.de>

	* gcc.target/nvptx/stack-atomics-run.c: New test.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove
	PR83812 workaround.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same.
	* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
2022-02-01 19:28:24 +01:00
Tom de Vries
ca902055d0 [nvptx] Fix reduction lock
When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  \
  execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  \
  execution test
...

The problem is in this code generated for a gang reduction:
...
$L39:
		atom.global.cas.b32     %r59, [__reduction_lock], 0, 1;
		setp.ne.u32     %r116, %r59, 0;
	@%r116  bra     $L39;
		ld.f64  %r60, [%r44];
		ld.f64  %r61, [%r44+8];
		ld.f64  %r64, [%r44];
		ld.f64  %r65, [%r44+8];
		add.f64 %r117, %r64, %r22;
		add.f64 %r118, %r65, %r41;
		st.f64  [%r44], %r117;
		st.f64  [%r44+8], %r118;
		atom.global.cas.b32     %r119, [__reduction_lock], 1, 0;
...
which is taking and releasing a lock, but missing the appropriate barriers to
protect the loads and store inside the lock.

Fix this by adding membar.gl barriers.

Likewise, add membar.cta barriers if we protect shared memory loads and
stores (even though the worker-partitioning part of the test-case is not
failing).

Tested on x86_64 with nvptx accelerator.

gcc/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* config/nvptx/nvptx.cc (enum nvptx_builtins): Add
	NVPTX_BUILTIN_MEMBAR_GL and NVPTX_BUILTIN_MEMBAR_CTA.
	(VOID): New macro.
	(nvptx_init_builtins): Add MEMBAR_GL and MEMBAR_CTA.
	(nvptx_expand_builtin): Handle NVPTX_BUILTIN_MEMBAR_GL and
	NVPTX_BUILTIN_MEMBAR_CTA.
	(nvptx_lockfull_update): Add level parameter.  Emit barriers.
	(nvptx_reduction_update, nvptx_goacc_reduction_fini): Update call to
	nvptx_lockfull_update.
	* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
	UNSPECV_MEMBAR_GL.
	(define_expand "nvptx_membar_gl"): New expand.
	(define_insn "*nvptx_membar_gl"): New insn.
2022-02-01 19:28:04 +01:00
Thomas Rodgers
07a971b28c Strengthen memory order for atomic<T>::wait/notify
This matches the memory order in libc++.

libstdc++-v3/ChangeLog:
	* include/bits/atomic_wait.h: Change memory order from
	Acquire/Release with relaxed loads to SeqCst+Release for
	accesses to the waiter's count.
2022-02-01 09:04:10 -08:00
Martin Liska
3ad29854f0 docs: remove --disable-stage1-checking from requirements
As the minimal GCC version that can build the current master
is 4.8, it does not make sense mentioning something for older
versions.

gcc/ChangeLog:

	* doc/install.texi: Remove option for GCC < 4.8.
2022-02-01 17:14:01 +01:00
Jakub Jelinek
e9bf6d6b0e veclower: Fix up -fcompare-debug issue in expand_vector_comparison [PR104307]
The following testcase fails -fcompare-debug, because expand_vector_comparison
since r11-1786-g1ac9258cca8030745d3c0b8f63186f0adf0ebc27 sets
vec_cond_expr_only when it sees some use other than VEC_COND_EXPR that uses
the lhs in its condition.
Obviously we should ignore debug stmts when doing so, e.g. by not pushing
them to uses.
That would be a 2 liner change, but while looking at it, I'm also worried
about VEC_COND_EXPRs that would use the lhs in more than one operand,
like VEC_COND_EXPR <lhs, lhs, something> or VEC_COND_EXPR <lhs, something, lhs>
(sure, they ought to be folded, but what if they weren't).  Because if
something like that happens, then FOR_EACH_IMM_USE_FAST would push the same
stmt multiple times and expand_vector_condition can return true even when
it modifies it (for vector bool masking).
And lastly, it seems quite wasteful to safe_push statements that will just
cause vec_cond_expr_only = false; and break; in the second loop, both for
cases like 1000 immediate non-VEC_COND_EXPR uses and for cases like
999 VEC_COND_EXPRs with lhs in cond followed by a single non-VEC_COND_EXPR
use.  So this patch only pushes VEC_COND_EXPRs there.

2022-02-01  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/104307
	* tree-vect-generic.cc (expand_vector_comparison): Don't push debug
	stmts to uses vector, just set vec_cond_expr_only to false for
	non-VEC_COND_EXPRs instead of pushing them into uses.  Treat
	VEC_COND_EXPRs that use lhs not just in rhs1, but rhs2 or rhs3 too
	like non-VEC_COND_EXPRs.

	* gcc.target/i386/pr104307.c: New test.
2022-02-01 16:02:54 +01:00
Bill Schmidt
7e83607907 rs6000: Don't #ifdef "short" built-in names
It was recently pointed out that we get anomalous behavior when using
__attribute__((target)) to select a CPU.  As an example, when building for
-mcpu=power8 but using __attribute__((target("mcpu=power10")), it is legal
to call __builtin_vec_mod, but not vec_mod, even though these are
equivalent.  This is because the equivalence is established with a #define
that is guarded by #ifdef _ARCH_PWR10.

This goofy behavior occurs with both the old builtins support and the
new.  One of the goals of the new builtins support was to make sure all
appropriate interfaces are available using __attribute__((target)), so I
failed in this respect.  This patch corrects the problem by removing the
ifdef.  Note that in a few cases we use an ifdef in a way that can't be
overridden by __attribute__((target)), and we need to keep those.  For
example, #ifdef __PPU__ is still appropriate.

2022-01-06  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-overload.def (VEC_ABSD): Remove #ifdef token.
	(VEC_BLENDV): Likewise.
	(VEC_BPERM): Likewise.
	(VEC_CFUGE): Likewise.
	(VEC_CIPHER_BE): Likewise.
	(VEC_CIPHERLAST_BE): Likewise.
	(VEC_CLRL): Likewise.
	(VEC_CLRR): Likewise.
	(VEC_CMPNEZ): Likewise.
	(VEC_CNTLZ): Likewise.
	(VEC_CNTLZM): Likewise.
	(VEC_CNTTZM): Likewise.
	(VEC_CNTLZ_LSBB): Likewise.
	(VEC_CNTM): Likewise.
	(VEC_CNTTZ): Likewise.
	(VEC_CNTTZ_LSBB): Likewise.
	(VEC_CONVERT_4F32_8F16): Likewise.
	(VEC_DIV): Likewise.
	(VEC_DIVE): Likewise.
	(VEC_EQV): Likewise.
	(VEC_EXPANDM): Likewise.
	(VEC_EXTRACT_FP_FROM_SHORTH): Likewise.
	(VEC_EXTRACT_FP_FROM_SHORTL): Likewise.
	(VEC_EXTRACTH): Likewise.
	(VEC_EXTRACTL): Likewise.
	(VEC_EXTRACTM): Likewise.
	(VEC_EXTRACT4B): Likewise.
	(VEC_EXTULX): Likewise.
	(VEC_EXTURX): Likewise.
	(VEC_FIRSTMATCHINDEX): Likewise.
	(VEC_FIRSTMACHOREOSINDEX): Likewise.
	(VEC_FIRSTMISMATCHINDEX): Likewise.
	(VEC_FIRSTMISMATCHOREOSINDEX): Likewise.
	(VEC_GB): Likewise.
	(VEC_GENBM): Likewise.
	(VEC_GENHM): Likewise.
	(VEC_GENWM): Likewise.
	(VEC_GENDM): Likewise.
	(VEC_GENQM): Likewise.
	(VEC_GENPCVM): Likewise.
	(VEC_GNB): Likewise.
	(VEC_INSERTH): Likewise.
	(VEC_INSERTL): Likewise.
	(VEC_INSERT4B): Likewise.
	(VEC_LXVL): Likewise.
	(VEC_MERGEE): Likewise.
	(VEC_MERGEO): Likewise.
	(VEC_MOD): Likewise.
	(VEC_MSUB): Likewise.
	(VEC_MULH): Likewise.
	(VEC_NAND): Likewise.
	(VEC_NCIPHER_BE): Likewise.
	(VEC_NCIPHERLAST_BE): Likewise.
	(VEC_NEARBYINT): Likewise.
	(VEC_NMADD): Likewise.
	(VEC_ORC): Likewise.
	(VEC_PDEP): Likewise.
	(VEC_PERMX): Likewise.
	(VEC_PEXT): Likewise.
	(VEC_POPCNT): Likewise.
	(VEC_PARITY_LSBB): Likewise.
	(VEC_REPLACE_ELT): Likewise.
	(VEC_REPLACE_UN): Likewise.
	(VEC_REVB): Likewise.
	(VEC_RINT): Likewise.
	(VEC_RLMI): Likewise.
	(VEC_RLNM): Likewise.
	(VEC_SBOX_BE): Likewise.
	(VEC_SIGNEXTI): Likewise.
	(VEC_SIGNEXTLL): Likewise.
	(VEC_SIGNEXTQ): Likewise.
	(VEC_SLDB): Likewise.
	(VEC_SLV): Likewise.
	(VEC_SPLATI): Likewise.
	(VEC_SPLATID): Likewise.
	(VEC_SPLATI_INS): Likewise.
	(VEC_SQRT): Likewise.
	(VEC_SRDB): Likewise.
	(VEC_SRV): Likewise.
	(VEC_STRIL): Likewise.
	(VEC_STRIL_P): Likewise.
	(VEC_STRIR): Likewise.
	(VEC_STRIR_P): Likewise.
	(VEC_STXVL): Likewise.
	(VEC_TERNARYLOGIC): Likewise.
	(VEC_TEST_LSBB_ALL_ONES): Likewise.
	(VEC_TEST_LSBB_ALL_ZEROS): Likewise.
	(VEC_VEE): Likewise.
	(VEC_VES): Likewise.
	(VEC_VIE): Likewise.
	(VEC_VPRTYB): Likewise.
	(VEC_VSCEEQ): Likewise.
	(VEC_VSCEGT): Likewise.
	(VEC_VSCELT): Likewise.
	(VEC_VSCEUO): Likewise.
	(VEC_VSEE): Likewise.
	(VEC_VSES): Likewise.
	(VEC_VSIE): Likewise.
	(VEC_VSTDC): Likewise.
	(VEC_VSTDCN): Likewise.
	(VEC_VTDC): Likewise.
	(VEC_XL): Likewise.
	(VEC_XL_BE): Likewise.
	(VEC_XL_LEN_R): Likewise.
	(VEC_XL_SEXT): Likewise.
	(VEC_XL_ZEXT): Likewise.
	(VEC_XST): Likewise.
	(VEC_XST_BE): Likewise.
	(VEC_XST_LEN_R): Likewise.
	(VEC_XST_TRUNC): Likewise.
	(VEC_XXPERMDI): Likewise.
	(VEC_XXSLDWI): Likewise.
	(VEC_TSTSFI_EQ_DD): Likewise.
	(VEC_TSTSFI_EQ_TD): Likewise.
	(VEC_TSTSFI_GT_DD): Likewise.
	(VEC_TSTSFI_GT_TD): Likewise.
	(VEC_TSTSFI_LT_DD): Likewise.
	(VEC_TSTSFI_LT_TD): Likewise.
	(VEC_TSTSFI_OV_DD): Likewise.
	(VEC_TSTSFI_OV_TD): Likewise.
	(VEC_VADDCUQ): Likewise.
	(VEC_VADDECUQ): Likewise.
	(VEC_VADDEUQM): Likewise.
	(VEC_VADDUDM): Likewise.
	(VEC_VADDUQM): Likewise.
	(VEC_VBPERMQ): Likewise.
	(VEC_VCLZB): Likewise.
	(VEC_VCLZD): Likewise.
	(VEC_VCLZH): Likewise.
	(VEC_VCLZW): Likewise.
	(VEC_VCTZB): Likewise.
	(VEC_VCTZD): Likewise.
	(VEC_VCTZH): Likewise.
	(VEC_VCTZW): Likewise.
	(VEC_VEEDP): Likewise.
	(VEC_VEESP): Likewise.
	(VEC_VESDP): Likewise.
	(VEC_VESSP): Likewise.
	(VEC_VIEDP): Likewise.
	(VEC_VIESP): Likewise.
	(VEC_VPKSDSS): Likewise.
	(VEC_VPKSDUS): Likewise.
	(VEC_VPKUDUM): Likewise.
	(VEC_VPKUDUS): Likewise.
	(VEC_VPOPCNT): Likewise.
	(VEC_VPOPCNTB): Likewise.
	(VEC_VPOPCNTD): Likewise.
	(VEC_VPOPCNTH): Likewise.
	(VEC_VPOPCNTW): Likewise.
	(VEC_VPRTYBD): Likewise.
	(VEC_VPRTYBQ): Likewise.
	(VEC_VPRTYBW): Likewise.
	(VEC_VRLD): Likewise.
	(VEC_VSLD): Likewise.
	(VEC_VSRAD): Likewise.
	(VEC_VSRD): Likewise.
	(VEC_VSTDCDP): Likewise.
	(VEC_VSTDCNDP): Likewise.
	(VEC_VSTDCNQP): Likewise.
	(VEC_VSTDCNSP): Likewise.
	(VEC_VSTDCQP): Likewise.
	(VEC_VSTDCSP): Likewise.
	(VEC_VSUBECUQ): Likewise.
	(VEC_VSUBEUQM): Likewise.
	(VEC_VSUBUDM): Likewise.
	(VEC_VSUBUQM): Likewise.
	(VEC_VTDCDP): Likewise.
	(VEC_VTDCSP): Likewise.
	(VEC_VUPKHSW): Likewise.
	(VEC_VUPKLSW): Likewise.
2022-02-01 08:55:48 -06:00
Andreas Krebbel
b9ebf6c330 PR101260 regcprop: Add mode change check for copy reg
When propagating a multi-word register into an access with a smaller
mode the can_change_mode backend hook is already consulted for the
original register.  This however is also required for the intermediate
copy in copy_regno which might use a different register class.

gcc/ChangeLog:

	PR rtl-optimization/101260
	* regcprop.cc (maybe_mode_change): Invoke mode_change_ok also for
	copy_regno.

gcc/testsuite/ChangeLog:

	PR rtl-optimization/101260
	* gcc.target/s390/pr101260.c: New testcase.
2022-02-01 13:33:55 +01:00
Xi Ruoyao
34afa19d29
fold-const: do not fold NaN result from non-NaN operands [PR95115]
These operations should raise an invalid operation exception at runtime.
So they should not be folded during compilation unless -fno-trapping-math
is used.

gcc/
	PR middle-end/95115
	* fold-const.cc (const_binop): Do not fold NaN result from
	  non-NaN operands.

gcc/testsuite
	* gcc.dg/pr95115.c: New test.
2022-02-01 18:20:57 +08:00
Tom de Vries
d43fbc7d3f [libgomp, testsuite] Fix insufficient resources in test-cases
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
  'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
  recompile the program with 'num_workers = x and vector_length = y' on \
  that offloaded region or '-fopenacc-dim=y' where x * y <= 896.

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  \
  -O0  execution test
...

The error does not occur when using GOMP_NVPTX_JIT=-O0.

Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia.

Likewise for some other test-cases.

Tested libgomp on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce
	num_workers for nvidia accelerator to fix libgomp error 'insufficient
	resources'.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	Same.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
2022-02-01 08:15:00 +01:00
Tom de Vries
be362d5e12 [libgomp, testsuite] Reduce recursion depth in declare_target-*.f90
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx
accelerator (Nvidia T400, 2GB), I run into:
...
libgomp: cuCtxSynchronize error: unspecified launch failure \
  (perhaps abort was called)

libgomp: cuMemFree_v2 error: unspecified launch failure

libgomp: device finalization failed
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O0  execution test
...

The test-case contains:
...
  ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
  ! Nvidia Titan V.
  if (fib (23) /= fib_wrapper (23)) stop 2
...

Fix this by reducing the fib/fib_wrapper argument from 23 to 22.

Same for declare_target-2.f90.

Tested on x86_64 with nvptx accelerator.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
	recursion depth.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
2022-02-01 08:13:06 +01:00
Tom de Vries
2989516651 [ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns
As mentioned in PR56888 comment 21:
...
-fno-tree-loop-distribute-patterns is the reliable way to not
transform loops into library calls.
...

However, since commit 6f966f0614 ("ldist: Recognize strlen and rawmemchr like
loops") a strlen or rawmemchr library call may be introduced by ldist.

This caused regressions in testcases
gcc.c-torture/execute/builtins/strlen{,-2,-3}.c for nvptx.

Fix this by not calling transform_reduction_loop from
loop_distribution::execute for -fno-tree-loop-distribute-patterns.

Tested regressed test-cases as well as gcc.dg/tree-ssa/ldist-*.c on
nvptx.

gcc/ChangeLog:

2022-01-31  Tom de Vries  <tdevries@suse.de>

	* tree-loop-distribution.cc (generate_reduction_builtin_1): Check for
	-ftree-loop-distribute-patterns.
	(loop_distribution::execute): Don't call transform_reduction_loop for
	-fno-tree-loop-distribute-patterns.

gcc/testsuite/ChangeLog:

2022-01-31  Tom de Vries  <tdevries@suse.de>

	* gcc.dg/tree-ssa/ldist-strlen-4.c: New test.
2022-02-01 08:12:24 +01:00
GCC Administrator
1bb5266257 Daily bump. 2022-02-01 00:16:29 +00:00
Andrew Pinski
691924db0d Fix comment for operand_compare::operand_equal_p.
The OEP_* enums were moved to tree-core.h in
r0-124973-g5e351e960763 but the comment was correct
when it was added added to fold-const.h in
r10-4231-g7f4a8ee03d40. This fixes the reference
to the OEP_* enum to reference tree-core.

Committed as obvious after a bootstrap/test on x86_64-linux.

gcc/ChangeLog:

	* fold-const.h (operand_compare::operand_equal_p):
	Fix comment about OEP_* flags.
2022-01-31 23:26:18 +00:00
Ed Smith-Rowland
43ee212764 MAINTAINERS: Update my email and add myself to the DCO list.
ChangeLog:

2022-01-31  Ed Smith-Rowland  <esmithrowland@gmail.com>

	* MAINTAINERS: Update my email and add myself to the DCO list.
2022-01-31 18:05:40 -05:00
Marek Polacek
874ad5d674 c++: ICE with auto[] and VLA [PR102414]
Here we ICE in unify_array_domain when we're trying to deduce the type
of an array, as in

  auto(*p)[i] = (int(*)[i])0;

but unify_array_domain doesn't arbitrarily complex bounds.  Another
test is, e.g.,

  auto (*b)[0/0] = &a;

where the type of the array is

  <<< Unknown tree: template_type_parm >>>[0:(sizetype) ((ssizetype) (0 / 0) - 1)]

It seems to me that we need not handle these.

	PR c++/102414
	PR c++/101874

gcc/cp/ChangeLog:

	* decl.cc (create_array_type_for_decl): Use template_placeholder_p.
	Sorry on a variable-length array of auto.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp23/auto-array3.C: New test.
	* g++.dg/cpp23/auto-array4.C: New test.
2022-01-31 15:35:59 -05:00
Marek Polacek
b1a8b92f8f c++: Reject union std::initializer_list [PR102434]
Weird things are going to happen if you define your std::initializer_list
as a union.  In this case, we crash in output_constructor_regular_field.

Let's not allow such a definition in the first place.

	PR c++/102434

gcc/cp/ChangeLog:

	* class.cc (finish_struct): Don't allow union initializer_list.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/initlist128.C: New test.
2022-01-31 15:35:20 -05:00
Patrick Palka
76dc465aaf c++: CTAD for class tmpl defined inside partial spec [PR104294]
Here during deduction guide generation for the nested class template
B<char(int)>::C, the computation of outer_args yields the template
arguments relative to the primary template for B (i.e. {char(int)})
but what we really want is those relative to C's enclosing scope, the
partial specialization of B (i.e. {char, int}).

	PR c++/104294

gcc/cp/ChangeLog:

	* pt.cc (ctor_deduction_guides_for): Correct computation of
	outer_args.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/class-deduction106.C: New test.
2022-01-31 15:27:58 -05:00
Patrick Palka
0eb06ee9a4 c++: CONSTRUCTORs are non-deduced contexts [PR104291]
PR c++/104291

gcc/cp/ChangeLog:

	* pt.cc (for_each_template_parm_r) <case CONSTRUCTOR>: Clear
	walk_subtrees if !include_nondeduced_p.  Simplify given that
	cp_walk_subtrees already walks TYPE_PTRMEMFUNC_FN_TYPE_RAW.

gcc/testsuite/ChangeLog:

	* g++.dg/template/partial20.C: New test.
2022-01-31 14:15:01 -05:00
Jakub Jelinek
2cbe5dd54f rs6000: Fix up build of non-glibc/aix/darwin powerpc* targets [PR104298]
As reported by Martin, while David has added OPTION_GLIBC define to aix
and Iain to darwin, all the other non-linux targets now fail because
rs6000.md macro isn't defined.

One possibility is to define this macro in option-defaults.h which on rs6000
targets is included last, then we don't need to define it in aix/darwin
headers and for targets using linux.h or linux64.h it will DTRT too.

The other option is the first 2 hunks + changing the 3
   if (!OPTION_GLIBC)
     FAIL;
cases in rs6000.md to e.g.
 #ifdef OPTION_GLIBC
   if (!OPTION_GLIBC)
 #endif
     FAIL;
or to:
 #ifdef OPTION_GLIBC
   if (!OPTION_GLIBC)
 #else
   if (true)
 #endif
     FAIL;
(the latter case if Richi wants to push the -Wunreachable-code changes for
GCC 13).

2022-01-31  Jakub Jelinek  <jakub@redhat.com>

	PR target/104298
	* config/rs6000/aix.h (OPTION_GLIBC): Remove.
	* config/rs6000/darwin.h (OPTION_GLIBC): Likewise.
	* config/rs6000/option-defaults.h (OPTION_GLIBC): Define to 0
	if not already defined.
2022-01-31 20:08:18 +01:00
Martin Sebor
48d3191e7b Constrain PHI handling in -Wuse-after-free [PR104232].
Resolves:
PR middle-end/104232 - spurious -Wuse-after-free after conditional free

gcc/ChangeLog:

	PR middle-end/104232
	* gimple-ssa-warn-access.cc (pointers_related_p): Add argument.
	Handle PHIs.  Add a synonymous overload.
	(pass_waccess::check_pointer_uses): Call pointers_related_p.

gcc/testsuite/ChangeLog:

	PR middle-end/104232
	* g++.dg/warn/Wuse-after-free4.C: New test.
	* gcc.dg/Wuse-after-free-2.c: New test.
	* gcc.dg/Wuse-after-free-3.c: New test.
2022-01-31 12:04:55 -07:00
Martin Liska
31ab99f7c8 contrib: update analyze_brprob_* scripts.
contrib/ChangeLog:

	* analyze_brprob.py: Support more formatted predict.def file.
	* analyze_brprob_spec.py: Improve output and documentation.
2022-01-31 16:42:15 +01:00
Nick Clifton
f10bec5ffa libiberty: Fix infinite recursion in rust demangler.
libiberty/
	PR demangler/98886
	PR demangler/99935
	* rust-demangle.c (struct rust_demangler): Add a recursion
	counter.
	(demangle_path): Increment/decrement the recursion counter upon
	entry and exit.  Fail if the counter exceeds a fixed limit.
	(demangle_type): Likewise.
	(rust_demangle_callback): Initialise the recursion counter,
	disabling if requested by the option flags.
2022-01-31 14:33:34 +00:00
Pierre-Marie de Rodat
36c155c893 [Ada] doc/share/conf.py: fix string handling
gcc/ada/

	* doc/share/conf.py: Remove spurious call to ".decode()".
2022-01-31 10:46:27 +00:00
Arnaud Charlet
2dbc237e86 [Ada] Fix up handling of ghost units PR104027 #2
gcc/ada/

	PR ada/104027
	* gnat1drv.adb (Gnat1drv): Only call Exit_Program when not
	generating code, otherwise instead go to End_Of_Program.
2022-01-31 10:46:27 +00:00
Jakub Jelinek
263a5944fc testsuite: Fix up tree-ssa/pr103514.c testcase [PR103514]
> > PR tree-optimization/103514
> >     * match.pd (a & b) ^ (a == b) -> !(a | b): New optimization.
> >     * match.pd (a & b) == (a ^ b) -> !(a | b): New optimization.
> >     * gcc.dg/tree-ssa/pr103514.c: Testcase for this optimization.
> >
> > 1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103514
> Note the bug was filed an fixed during stage3, review just didn't happen in
> a reasonable timeframe.
>
> I'm going to ACK this for the trunk and go ahead and commit it for you.

The testcase FAILs on short-circuit targets like powerpc64le-linux.
While the first 2 functions are identical, the last two look like:
  <bb 2> :
  if (a_5(D) != 0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  if (b_6(D) != 0)
    goto <bb 5>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 4> :

  <bb 5> :
  # iftmp.1_4 = PHI <1(3), 0(4)>
  _1 = a_5(D) == b_6(D);
  _2 = (int) _1;
  _3 = _2 ^ iftmp.1_4;
  _9 = _2 != iftmp.1_4;
  return _9;
instead of the expected:
  <bb 2> :
  _3 = a_8(D) & b_9(D);
  _4 = (int) _3;
  _5 = a_8(D) == b_9(D);
  _6 = (int) _5;
  _1 = a_8(D) | b_9(D);
  _2 = ~_1;
  _7 = (int) _2;
  _10 = ~_1;
  return _10;
so no wonder it doesn't match.  E.g. x86_64-linux will also use jumps
if it isn't just a && b but a && b && c && d (will do
a & b and c & d tests and jump based on those.

As it is too late to implement this optimization even for the short
circuiting targets this late (not even sure which pass would be best),
this patch just forces non-short-circuiting for the test.

2022-01-31  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/103514
	* gcc.dg/tree-ssa/pr103514.c: Add
	--param logical-op-non-short-circuit=1 to dg-options.
2022-01-31 10:30:58 +01:00
Martin Liska
e97cfaa9f6 d: Fix -Werror=format-diag error.
PR d/104287

gcc/d/ChangeLog:

	* decl.cc (d_finish_decl): Remove trailing dot.
2022-01-31 09:49:41 +01:00
Martin Liska
c99a6eb015 Add mold detection for libs.
libatomic/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libgomp/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libitm/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.

libstdc++-v3/ChangeLog:

	* acinclude.m4: Detect *_ld_is_mold and use it.
	* configure: Regenerate.
2022-01-31 09:46:44 +01:00
Richard Biener
625f16c798 Fix multiple_of_p behavior with NOP_EXPR
We were passing down the original type to recursive invocations
of multiple_of_p for say (int)(unsigned * unsigned).

2022-01-24  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/100499
	* fold-const.cc (multiple_of_p): Pass the correct type of
	the expression to the recursive invocation of multiple_of_p
	for conversions and use CASE_CONVERT.
2022-01-31 09:38:10 +01:00
Eric Botcazou
23987912dd Use V8+ default in 32-bit mode on SPARC64/Linux
This is what has been done for ages on SPARC/Solaris and makes it possible
to use 64-bit atomic instructions even in 32-bit mode.

gcc/
	PR target/104189
	* config/sparc/linux64.h (TARGET_DEFAULT): Add MASK_V8PLUS.
2022-01-31 09:21:48 +01:00
Eric Botcazou
825e5599f3 Add testcase for incorrect optimization in Ada
gcc/testsuite/
	* gnat.dg/div_zero.adb: New test.
2022-01-31 09:15:30 +01:00
Richard Biener
3c7067cc92 Reduce multiple_of_p uses
There are a few cases where we know we're dealing with (poly-)integer
constants, so remove the use of multiple_of_p in those cases to make
the PR100499 fix less impactful.

2022-01-24  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/100499
	* tree-cfg.cc (verify_gimple_assign_ternary): Use multiple_p
	on poly-ints instead of multiple_of_p.
	* tree-ssa.cc (maybe_rewrite_mem_ref_base): Likewise.
	(non_rewritable_mem_ref_base): Likewise.
	(non_rewritable_lvalue_p): Likewise.
	(execute_update_addresses_taken): Likewise.
2022-01-31 08:55:45 +01:00
GCC Administrator
c67ffc256d Daily bump. 2022-01-31 00:16:28 +00:00
Hans-Peter Nilsson
baf98320ac libstdc++ testsuite: Don't run lwg3464.cc tests on simulators
These tests have always been failing for my autotester running a
cris-elf simulator; when unrestrained they take about 20 minutes each,
compared to the (doubled) timeout of 720 seconds, of a total 2h40min
for the whole of the libstdc++-v3 testsuite.  The tests cover counter
overflow and are already disabled for LP64 targets.

	* testsuite/27_io/basic_istream/get/char/lwg3464.cc: Don't run on
	simulator targets.
	* testsuite/27_io/basic_istream/get/wchar_t/lwg3464.cc: Likewise.
2022-01-30 17:51:02 +01:00
GCC Administrator
d1182631ee Daily bump. 2022-01-30 00:16:20 +00:00
Jakub Jelinek
3d41939c87 testsuite: Fix up tree-ssa/divide-7.c testcase [PR95424]
This test fails everywhere, because ? doesn't match literal ?.
It should use \\? instead.  I've also changed those .s in there.

2022-01-29  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/95424
	* gcc.dg/tree-ssa/divide-7.c: Fix up regexps in scan-tree-dump{,-not}.
2022-01-29 17:55:51 +01:00
Jakub Jelinek
a154487896 match.pd: Fix up 1 / X for unsigned X optimization [PR104280]
On Fri, Jan 28, 2022 at 11:38:23AM -0700, Jeff Law wrote:
> Thanks.  Given the original submission and most of the review work was done
> prior to stage3 closing, I went ahead and installed this on the trunk.

Unfortunately this breaks quite a lot of things.
The main problem is that GIMPLE allows EQ_EXPR etc. only with BOOLEAN_TYPE
or with TYPE_PRECISION == 1 integral type (or vector boolean).
Violating this causes verification failures in tree-cfg.cc in some cases,
in other cases wrong-code issues because before it is verified we e.g.
transform
1U / x
into
x == 1U
and later into
x (because we assume that == type must be one of the above cases and
when it is the same type as the type of the first operand, for boolean-ish
cases it should be equivalent).

Fixed by changing that
(eq @1 { build_one_cst (type); })
into
(convert (eq:boolean_type_node @1 { build_one_cst (type); }))
Note, I'm not 100% sure if :boolean_type_node is required in that case,
I see some spots in match.pd that look exactly like this, while there is
e.g. (convert (le ...)) that supposedly does the right thing too.
The signed integer 1/X case doesn't need changes changes, for
(cond (le ...) ...)
le gets correctly boolean_type_node and cond should use type.
I've also reformatted it, some lines were too long, match.pd uses
indentation by 1 column instead of 2 etc.

2022-01-29  Jakub Jelinek  <jakub@redhat.com>
	    Andrew Pinski  <apinski@marvell.com>

	PR tree-optimization/104279
	PR tree-optimization/104280
	PR tree-optimization/104281
	* match.pd (1 / X -> X == 1 for unsigned X): Build eq with
	boolean_type_node and convert to type.  Formatting fixes.

	* gcc.dg/torture/pr104279.c: New test.
	* gcc.dg/torture/pr104280.c: New test.
	* gcc.dg/torture/pr104281.c: New test.
2022-01-29 17:54:43 +01:00
GCC Administrator
f6f2d6cfec Daily bump. 2022-01-29 00:16:22 +00:00
Yoshinori Sato
06995c2958 sh-linux fix target cpu
sh-linux not supported any SH1 and SH2a little-endian.

gcc

	* config/sh/t-linux (MULTILIB_EXCEPTIONS): Add m1, mb/m1 and m2a.
2022-01-28 17:17:39 -05:00
Navid Rahimi
cb3ac1985a tree-optimization/103514 Missing XOR-EQ-AND Optimization
This patch will add the missed pattern described in bug 103514 [1] to the match.pd. [1] includes proof of correctness for the patch too.

1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103514

gcc/
	PR tree-optimization/103514
	* match.pd (a & b) ^ (a == b) -> !(a | b): New optimization.
	(a & b) == (a ^ b) -> !(a | b): New optimization.

gcc/testsuite
	* gcc.dg/tree-ssa/pr103514.c: Testcase for this optimization.
2022-01-28 17:13:08 -05:00
Marek Polacek
5d8b422818 doc: Update -Wbidi-chars documentation
gcc/ChangeLog:

	* doc/invoke.texi: Update -Wbidi-chars documentation.
2022-01-28 15:58:41 -05:00
Patrick Palka
e971990cbd c++: bogus warning with value init of const pmf [PR92752]
Here we're emitting a -Wignored-qualifiers warning for an intermediate
compiler-generated cast of nullptr to 'method-type* const' as part of
value initialization of a const pmf.  This patch suppresses the warning
by instead casting to the corresponding unqualified type.

	PR c++/92752

gcc/cp/ChangeLog:

	* typeck.cc (build_ptrmemfunc): Cast a nullptr constant to the
	unqualified pointer type not the qualified one.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/Wignored-qualifiers2.C: New test.

Co-authored-by: Jason Merrill <jason@redhat.com>
2022-01-28 15:41:15 -05:00
Iain Sandoe
3a5fdf986d Darwin, PPC: Fix bootstrap after GLIBC version changes.
A recent patch added tests for OPTION_GLIBC that is defined in
linux.h and linux64.h.  This broke bootstrap for powerpc Darwin.
Fixed by adding a definition to 0 for OPTION_GLIBC.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

gcc/ChangeLog:

	* config/rs6000/darwin.h (OPTION_GLIBC): Define to 0.
2022-01-28 19:17:16 +00:00
Zhao Wei Liew
c2b610e7c6 match.pd: Simplify 1 / X for integer X [PR95424]
This patch implements an optimization for the following C++ code:

int f(int x) {
    return 1 / x;
}

int f(unsigned int x) {
    return 1 / x;
}

Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly:

f(int):
    xor edx, edx
    mov eax, 1
    idiv edi
    ret
f(unsigned int):
    xor edx, edx
    mov eax, 1
    div edi
    ret

In comparison, clang++ -std=c++20 -O3 produces the following assembly:

f(int):
    lea ecx, [rdi + 1]
    xor eax, eax
    cmp ecx, 3
    cmovb eax, edi
    ret
f(unsigned int):
    xor eax, eax
    cmp edi, 1
    sete al
    ret

Clang's output is more efficient as it avoids expensive div operations.

With this patch, GCC now produces the following assembly:

f(int):
    lea eax, [rdi + 1]
    cmp eax, 2
    mov eax, 0
    cmovbe eax, edi
    ret
f(unsigned int):
    xor eax, eax
    cmp edi, 1
    sete al
    ret

which is virtually identical to Clang's assembly output. Any slight differences
in the output for f(int) is possibly related to a different missed optimization.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587751.html
Changes from v2:
1. Refactor from using a switch statement to using the built-in
if-else statement.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html
Changes from v1:
1. Refactor common if conditions.
2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type.
3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1.

gcc/ChangeLog:

	PR tree-optimization/95424
	* match.pd: Simplify 1 / X where X is an integer.
2022-01-28 13:36:39 -05:00